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Introduc ti on 



The papers contained in this issue of Working Papers in Linguistics 
deal mainly with experimental topics. Units in Speech Perception , by 
Z. S. Bond, constitutes her dissertation. The next three papers, by 
L. Shockey, R. Gregorski, and I. Lehiste, deal with various aspects 
of the temporal structure of spoken language. M. V. Wendell* s paper, 
"Relative Intelligibility of Five Dialects of English”, is her under- 
graduate honors thesis. The volume concludes with three papers devoted 
to specific languages. Of these, the papers on Hungarian and Estonian 
are based on experimental techniques; the paper on Latvian and Lithuanian 
deals with historical phonology. Z. S. Bond*s dissertation, I. Lehiste* s 
paper, and the two papers written jointly by L. Shockey, R. Gregorski 
and I. Lehiste were partly supported by the National Science Foundation 
under Grant No. GN-53^.1. The other papers are published with support 
from the Graduate School of The Ohio State University. 
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Units in Speech Perception* 
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INTRODUCTION 



Speech perception, as a field of empirical investigation, is 
very much involved with linguistics: a model of speech perception is 
crucially dependent on a model of language, since the model of 
language tells the perception theorist what it is that the listener 
has to perceive. 

Thus, historically, there has "been a tendency for models of speech 
perception to he related to the current linguistic models of language. 
The early models of speech perception are not specific enough, by 
current standards, simply because the model of language that the 
theorist was dealing with was not a very complex model-language was 
conceived to be something like a series of words strung together. 

As more complicated and more precise linguistic models become 
current, the theorizing about speech perception also became more 
precise and more experimentally oriented. Thus, structural linguistics 
of the 19^0* s and 1950 * s led to experimental work which assumed that 
the phoneme, or some unit very much like a phoneme, was the perceptual 
unit in phonology. The problem in understanding speech perception 
was then seen as discovering how a listener can translate* or ’decode* 
a continuous acoustic signal into discrete phonemes. And, though 
alternative suggestions have been made, most theorists still assume 
that the incoming speech signal is represented in some phoneme-like 
units as the first step in speech perception. 
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Experimental work on higher-level perceptual units, related to 
the syntactic structure of a sentence, has begun quite recently. Some 
early theorists have advanced ideas of what is involved in understanding 
sentences, but, again, the work could not lead to any precise theoretical 
formulations until a fairly adequate theory of syntax became available; 
thus, almost all empirical studies involving the perception of 
syntactic units assume that the syntactic relationships described in 
transformational grammar are involved in speech perception at some level. 
However, the experiments have tended not to separate perceptual effects 
from memory effects; and there is no agreement — such as implicitly 
exists in theories of the perception of phonological segments — whether 
there are some syntactic units involved in perception and, if so, what 
these units are. 

Generative phonology, which does not assume any unit equivalent to 
the traditional phoneme, has not so far led to any experimental work 
on speech perception, though it is intimately related to models of 
speech perception involving analysis-by-synthesis . 

In this study, the attempt is made to examine some units that 
function in speech perception. The first chapter contains a survey of 
models that have been proposed to account for speech perception. The 
survey includes some models because of the historical background they 
provide, even though the models make no specific predictions about 
units in speech perception. More recent models make certain predictions 
about perceptual units, and these will be pointed out when the 
theoretical implications of the perceptual models are discussed. 

Three experiments are reported. The first experiment involves 
a subject's ability to make use of sub-phonemic phonetic differences. 
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Subjects are asked to identify productions of mono-morphemic and bi- 
morphemic words of identical phonemic shape, e.g., lax vs. lacks . The 
purpose of the experiment is two-fold: to determine what a baseline* 
for perception is — what is the least amount of phonetic difference 
that can be used for linguistic purposes — and to determine if the 
traditional phoneme, which is often accepted as the perceptual unit, 
defines a lower limit below which a listener can not make use of phonetic 
differences . 

The second experiment involves the perception of obstruent 
clusters. Subjects are asked to identify words with reversible 
obstruent clusters, such as task vs. tax , in the presence of noise. The 
purpose of the experiment is to determine whether consonant clusters 
are coded * phoneme-by-phoneme * , as the traditional assumptions would 
imply, or if subjects employ some alternative perceptual mechanisms. 

The third experiment seeks to determine perceptual units in syntax. 
Subjects are asked to respond, by pressing a button, when they hear a 
1 click* in a sentence. From reaction time to the click, the effects 
of a phonologically defined phrase on perceptual segmentation can be 
determined. 

Finally, the implications of the experimental studies to models 
of speech perception are discussed. 



CHAPTER ONE 



MODELS OF SPEECH PERCEPTION 



The purpose of this chapter is to provide some historical background 
and to present the current ideas of theorists attempting to account for 
speech perception. Not all of the models that will be discussed in 
this chapter make specific predictions about what units are involved 
in speech perception, but they are included simply because many are 
interesting in themselves or for historical reasons. 

No attempt will be made to evaluate the adequacy of any of these 
models in this chapter. Rather, the models that still hold promise 
will be discussed in the last chapter in terms of the theoretical 
implications of the empirical studies reported in this work. 

Models of speech perception have been classified under the following 
headings: behavioristic models, information theory models, motor 
theories, analysis by synthesis models, models proposing filtering' 
as a primary device , and models depending on perceptual strategies . 

Behaviorism 

There is a long behaviorist tradition of theories of speech 
perception. Appropriately enough, it begins with J. B. Watson (1930). 
Watson 1 s general behaviorist position is well known, and his views 
of language — not developed in any great detail — follow from it 
clearly. Since he refuses to postulate any w mentalistic constructs, 11 

* f 

. is ; 



he discusses language in observable, physicalistic terms. Language 
is simply a "manipulative habit of the vocal tract" (Watson, p» 225). 
When a person learns to speak, he develops a conditioned response — 
some movement of the vocal tract — for every object and situation in 
his external environment. These conditioned responses are equivalent 
to words. Such internalized kinaesthetic responses can call out 
further reponses in the same way as the objects for which they serve 
as substitutes do; because of these kinaesthetic verbal substitutes, 
a person carries the world around with him; he can manipulate the world 
(think) by means of series of motor responses. 

Sentences , and other language sequences , are accounted for by 
the following example: a child hears the bed time prayer "Now I lay 
me down to sleep ..." The first few times he hears it , the first word 
of the sentence, "now," makes the child produce the motor response 
which is his internal equivalent of "now;" similarly "i" leads to 
internalized "I," etc. After repeated experiences, the motor response 
"now" will lead directly to the motor response "I," with no necessary 
intervening step. At this point, the child has learned the sentence. 
Spontaneous speech, Watson believes, follows essentially the same 
principles: some stimulus touches off old verbal organization. 

Speech perception offers no particular difficulty: the incoming 
stimulus makes the listener form the equivalent kinaesthetic-motor 
responses. Watson, therefore, is postulating a simple motor theory 
of speech perception, involving incipient muscle activity. 

In Language , Bloomfield (1933) offers a much more sophisticated 
analysis of language, but his outlook is essentially behavioristic. 
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Bloomfield analyzes an event involving speech by means of a little 
scene with two characters. Jack and Jill. Externally, the action is 
quite simple: Jack and Jill are walking along a road; Jill makes a 
series of noises with her vocal tract; Jack climbs a fence, and 
brings Jill an apple from a nearby tree. 

Looking at the scene more analytically, there are a number of 
practical events preceding the act of speech. These practical events 
are quite complex, but taken together, they can be considered as a 
stimulus for Jill. As a speaking human, Jill has a choice: she can 
make a direct response (go get the apple), or she can make a linguistic 
substitute response (ask Jack for the apple). For Jack, the speech is 
a substitute linguistic stimulus, which makes him produce a particular 
response. 

Essentially, speech enables stimuli and responses to occur in 
different individuals, as indicated in the following diagram: 

S -► r s -*• R 

Bloomfield is not very specific in discussing what is involved 
in Jack*s reception of the message. In relation to phonology, Bloomfield 
argues that speakers of a language habitually and conventionally 
discriminate some features of sound and ignore others; presumably, 
then, there are distinctive properties of sound to which Jack is 
sensitive. These encode the message. 

The behaviorist tradition is carried on in the 1950* s. by the 
psychologists B. F. Skinner (1957) , 0. H. Mowren (195*0 * and C. E. 

Osgood (1963). 

Mowrer does not offer a complete, theory of language, but an 
analysis of declarative sentences in stimulus-response (henceforth S-R) 




o 

.ERIC 



7 

terminology. Essentially, he suggests that a sentence is an arrange- 
ment for conditioning the meaning reaction produced by the predicate 
to the stimulation aroused by the meaning reaction elicited by the 
subject. In other words, a subject-predicate sentence is to be 
considered a conditioning device. 

The conditioning device operates in the following way. When 
the listener hears any word in his vocabulary, there is aroused in 
him a unique "meaning response." When he hears a sentence, for 
example, "Tom is a thief," first there is aroused in the listener 
a "meaning response" which is his internal, representation of the word 
"Tom" as well as of the physical Tom. Then, because a sentence is a 
conditioning device, to this "meaning response" is added the "meaning 
response" of "thief." As a consequence, the listener comes to respond 
differently to the physical Tom; he will avoid him, perhaps, and not 
lend him money. In short, he will treat Tom as a thief. 

One of the most thorough attempts to explain language behavior 
in S-R terms is B. F. Skinner’s boo k Verbal Behavior (1958). Skinner 
declines to speculate about nor^-observable language phenomena; rather, 
he sees the task of the science of verbal behavior to determine the 
laws governing verbal behavior. These laws concern the predictability 
and control of particular verbal responses. That is, the task is 
accomplished when it is possible to predict what a person will say. 

Because of this goal, and because he rejects non-observables, 
Skinner has little to say about internal phenomena such as perception. 
He does offer a few suggestions. First, Skinner defines a unit of 
verbal behavior as anything that is under the independent control of 
a manipulable (st imulu s) variable. This unit can be as large as a 
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whole phrase, such as n Hcv are you? 1 ', or as small as a change in 
fundamental frequency, used to ask a question. In order for language 
to function at all, these units must lead to different responses by 
listeners. Secondly, Skinner points out that at any time in sequential 
verbal behavior, e.g. sentences, what has been said before sharply 
limits what will be said next: there is redundancy in language. 

Presumably, the listener can also take advantage of such redundancy. 

But Skinner does not attempt to present any theory of speech 
perception; the few suggestions that he makes do not detract from his 
basic assumption that perception can not be separated from responses 
in any meaningful way. 

C. E. Osgood also offers a behavioristic theory of speech (Osgood, 
1963), which he calls a three-stage mediation model. Unlike Skinner, 

Osgood is quite ready to postulate mechanisms internal to the speaker 

and listener. Rather than being concerned only with observable stimuli and 
response;, Osgood wants to fill tne "black box" of the organism with 

intervening S-ii: constructs. Osgood’s three-stage model is represented below. 



LEVEL 
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Fig. 1. Three-stage mediation-integration model, 
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(by permission of Charles E. Osgood) 

Osgood 1 s model differs from Skinnerian S-R models in two ways. 

First, Osgood postulated mediating responses (r m ). These internal 
r m f s are a fractional, easily differentiable, part of an original 
overt response. Since the original response was elicited by some stimulus, 
the fractional r m becomes an internal representation of the stimulus. 

The internal r m f s, in turn, can lead to various instrumental acts. 
Essentially, Osgood hopes to account for meaning by these internal' 
representations. These internal representations, however, are quite 
complex; basically, Osgood holds that words are coded by means of a 
simultaneous bundle of semantic features (Osgood, 1963). 

Secondly, Osgood postulates stimulus integration (S-S learning) 
and response integration (R-R learning) to account for the perceptual 
and motor complexity found in speech. He argues that, in perception, 
the greater the frequency with which stimulus events have been paired in 
the input experience of the organism, the greater will be the tendency 
for their central neural correlates to activate each other. In other 
words, a partial sensory input will become adequate to trigger the 
whole; it will lead to what the Gestalt psychologists called "closure.” 

This closure principle can only operate if there are perceptual 
units which function as wholes. These units must meet three criteria: 
they must be highly redundant, they must be fairly frequent in 
occurrence, and they must not exceed certain temporal limits. The 
most likely perceptual -units are words. 

In perceiving a sentence, the phonetic information is adequate 
to trigger the phonological representation of a particular word, e.g. 

ERIC 
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play . The context of the sentence then determines the semantic 



interpretation of the vord. Given, for example, the sentence "The 
play got rave reviews," the word play will be interpreted as a noun 

on the basis of the frame Determiner verb. The word review 

will eliminate the interpretation of play in the sense of gambling. 

On the basis of such linguistic information and on the basis of non- 
linguistic context, the listener will arrive at the intended message. 

More recently, psychologists, even though they may consider 
themselves behavior ists , have broken away from S-R formulations 
altogether . 

In his very interesting book. The Senses Considered as Perceptual 
Systems , James J. Gibson (1966) emphasizes the information contained 
in stimulation, rather than the discrete responses of separate 
sensory systems. Therefore, he rejects the traditional decomposition 
of a complex sound into a combination of pitch, duration, and loudness 
specifications in order to describe the stimulus. He considers it a 
better approach to look for higher-order variables characteristic of 
the stimulus: 

"In meaningful sounds, these variables can be combined 
to yield higher-order variables of staggering complexity. 

But these mathematical complexities seem nevertheless to 
be the simplicities of auditory informaiton, and it is 
just these variables that are distinguished naturally by 
an auditory system." (p. 87 ), 

In other words, it is a mistake to think that the perceptual system 
"builds up" complex stimuli from simple components; rather, complex 
stimuli are responded to directly. 

The higher-order variables have not been studied for most types 
of meaningful sound, but there have been a few attempts to study 
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such variables in the acoustic spe-ech signal. According to Gibson, 
frequency ratios and the relational patterns of frequencies are the 
invariants provided by the speech signal. 

The pick-up of phonemes is a direct one-stage process; however, 
the apprehension of things referred to — a semantic decoding of the 
speech signal — is a two-stage process since not only the speech sounds 
but what they stand for have to be apprehended. "The acoustic sounds 
of speech specify the consonants, vowels, syllables, and words of 
speech; the parts of speech in turn specify something else." (p. 91). 

The structure of speech can be analyzed at various levels, 
hierarchically organized, and each level has some unit appropriate to 
it: at each level, there is an appropriate stimulus unit for the 
perceptual system. 

Information Theory 

During the 1950* s, information theory provided conceptual structures 
by which all types of communication — defined as the transmission of 
information — could be analyzed. Theorists concerned with speech also 
tried to apply the concepts of information theory to their field, and 
developed models of speech communication. These speech communication 
models discussed both a speaker and a hearer, but tended to emphasize 
the former. Many models of the speech communication system were 
proposed; these are summarized by Grant Fairbanks (195*0 , who also 
presents one of the most detailed analyses of speech from this point 
of view. However, most of his discussion concerns speech p? >duction. 
Perception is discussed almost exclusively in terms of its role in 
feedback: the speaker monitors his own output and changes his output 
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when it does not meet^ the criteria set by the input to the speech 
systems . 

Fairbanks* model, is reproduced in Fig. 2. Essentially, the mo-del 
offers the following analysis of speech production: an input signal 
to the speech mechanisms results in some output; this output is compared 
with the stored input; if the output has not yet reached the target 
specified by the input, an error signal is sent out to adjust the 
output . 

There are several interesting points concerning the speech model. 

First, Fairbanks postulates a "unit of control." Although he does not 

go into detail, he suggests that the unit of speech control is not to 

be identified with any currently recognized phonetic unit; rather, the 

unit of speech control is a "semi -periodic , relatively long, articulatory 

cycle" (p. 138). Secondly, the model implies that certain steady-state 

outputs are the goals of the speech mechanism and that transitions are 

only by-products. In Pair banks * words: 

"it is to be emphasized that the steady states are the primary 
objectives, the targets. The transitions are useful incidents 
on the way to the targets . The roles of both are probably 
very analogous when the dynamic speech output is perceived 
by an independent listeners." (p. 139) 

Fairbanks has little to say about speech perception directly. 
Presumably, perception follows the path described for feedback. Whether 
the message is analyzed directly or whether it is compared in the 
comparator with a possible message — as in motor theories of speech 
perception — is not specified in Fairbanks* model. 
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EFFECTIVE DRIVING EFFECTOR 




FEEDBACK SIGNALS 



SENSOR UNIT 



Fig. 2. Model of a closed cycle control system for speaking. 
(Grant Fairbanks, "A Theory of the Speech Mechanism as 
a Servo-System." Journal of Speech and Hearing Disorders 
19 (195^). By permission of the American Speech and 
Hearing Association). 

Although it uses concepts from information theory, Hockett *s model 
of speech communication (1956) is much more linguistic in orientation 
than Fairbanks* model, at least in the sense that linguistic terminology 
is applied to various processes. However, Hockett cautions that the 
* phoneme* and * morpheme* of internal circuitry are not to be strictly 
equated with the phoneme and morpheme of linguistics. 

Hockett *s model (Fig. 3) represents the internal mechanisms 
necessary for Jill to communicate with Jack. First, a sequence of 
morphemes is emitted by GHQ (grammatical headquarters); then the 
morphemes are recoded into a discrete flow of phonemes by morphophonemic 
processes. Finally, the phonemes become a continuous speech signal in 
the "speech transmitter." The speaker monitors his own speech signal, 
but he does not use feedback to adjust the output continuously. 

The listener uses the same communications system, but the speech 
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receiver sends the signal through in the other direction;. the speech 
receiver picks up the signal and transduces it into a discrete flow 
of phonemes; the phonemes are assembled into morphemes and submitted 
to GHQ. A listener understands a message when his GHQ is going through 
the same "states" as the speaker’s GHQ. Hockett also suggests that 
a listener decodes an incoming signal partly by comparing it with the 
articulatory motions that the listener would have to make to produce 
the signal. 




JILL JACK 

Fig. 3. A model of speech communication. 

(Charles Hockett, A Manual of Phonology, 1955, by 
permission of Indiana University Publications in 
Anthropology and Linguistics and Prof. Charles F. 
Hockett . ) 



Filtering 

In his article "On the Process of Speech Perception," J. C. R. 
Licklider (1952) analyzes the process of speech perception into three 
main operations: translation of the speech signal into a form suitable 
for the nervous system, identification of speech elements, and 
comprehension of meaning. 
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The first process is performed by the cochlea; the signal is 
mechanically analyzed in terms of frequency and intensity in such a 
way that the output is somewhat similar to a sound spectrogram. 
However, since the frequency analysis of the cochlea is not very 
selective, the signal is sharpened further up the auditory pathways. 
Thus, the input to the perceptual mechanism consists of a sharpened 
frequency analysis of the acoustic signal, coded in terms of origin 
on the cochlea, and intensity, coded in terms of density of discharge. 
Furthermore, there is a representation of the fundamental frequencies 
of the periodic components of the acoustic signal. 

The second process, identification of speech elements, could he 
performed by one of two mechanisms, a correlator or a filter. A 
correlator is essentially a device for matching the incoming signal 
against an internally stored representation (or a representation 
created by rules). A filter, on the other hand, has the required 
patterns built into its structure; the identification of the incoming 
signal is made on the basis of which filter the signal passes through 
most successfully. Although the choice is tentative, Licklider favors 
the filter model as the device which identifies speech elements. 

Comprehension, on the other hand, can best be explained as an 
active process. Therefore, Licklider argues that comprehension of 
meaning involves matching the input to a set of internal patterns. 
Although he does not say this, Licklider would probably maintain that 
these patterns are generated as needed. 

Licklider* s model, therefore, is very much like analysis-by- 
synthesis for the processing of sentences. For smaller units, however, 
Licklider prefers the more direct analysis provided by filtering. 



1 6 



A "filtering” theory, differing in interesting ways from Licklider's, 

has been recently developed by Wayne A. Wickelgren (1969a, 1969b). 

Previous theories have assumed that, no matter how speech is processed, 

the phoneme is the primary unit of coding in perception. Wickelgren 

proposes a theory in which the perception and production of speech is 

coded in some unit that is more closely related to the traditional 

allophone. He calls this theory context-sensitive coding. 

"I define a context-sensitive code for words to consist of an 
unordered set of symbols for every word, where each symbol 
restricts the choice of its left and right neighbors 
sufficiently to determine them uniquely out of the unordered 
set for any given word. In this case, the unordered set, in 
conjunction with the dependency rules, contains all the 
information necessary to reconstruct a unique ordering of the 
symbols for each word." (1969b, p. 86 ) 

In speech perception, context-sensitive coding would work in the 
following way. Each context-sensitive allophone of the language would 
have a unique internal representative. This internal representative 
would be activated by some conjunction of acoustic features, occurring 
over a period of time as long as a few hundred milliseconds. All 
allophone representatives would be examining the acoustic input in 
parallel, but only a few would be activated in response to the input. 
After the set of allophones has been determined, the word representative . 
which is most closely associated with the set of allophones can be 
selected. 

Wickelgren claims that his theory eliminates two of the major 
problems associated with perception models which postulate phonemes 
as the basic units: first, there is no need to segment the acoustic 
wave form; second, it is more likely — although the evidence is not in — 
that there is invariance in the acoustic signal for allophones. 
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The model of speech perception proposed by L. V. Bondarko and 
others (Bondarko et al., 1970) is designed to account for the set of 
operations that transform an acoustic speech signal into a sequence 
of words. Each word in the output would have associated .with it a set 
of lexical and grammatical features which would he employed in under- 
standing the message. 

The model consists of hierarchically-arranged processes. At each 
level, there is a perceptual procedure, decision making, and a procedure 
for assigning a certain reliability to the decision. If no decision 
can be made with a threshold degree of reliability, the level outputs 
several possible interpretations of the input signal, and the final 
decision is postponed. The final decision may not be made, in fact, 
until the last stage — the recognition of the meaning of the utterance. 

The first stage of the perceptual process is auditory analysis. 

The output of the cochlea is described in the set of parameters that 
are relevant in the perception of speech. The output of the auditory 
analysis is then classified into phonemes (a phoneme is defined as the 
subjective image employed by the brain of the listener in the process 
of speech recognition (p. 11*0; thus it is not strictly equivalent 
to the linguistic phoneme). Information distributed over an open syllable 
is employed in this classification process. At the next level, the 
string of phonemes is segmented, taking stress into account. Then the 
segmented string is interpreted as a sequence of words. 

The Motor Theory of Speech Perception 

Although motor theories of speech perception have been advanced 
by quite a number of theorists , the most explicit and reasoned statement 
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of the motor theory has been formulated by workers at Haskins 
Laboratories, namely F. S. Cooper, A. M. Liberman, D. P. Shankweiler, 
and others. For example, in an early discussion of some of their 
results (Cooper et al., 1952), the Haskins group advanced the motor 
theory . 

The research as Haskins began with a search for invariants in 
speech — "A one-to-one correspondence between something half -hidden in 
the spectrogram and the successive phonemes of the message.” (Cooper 
et al., 1952, p. 6o4). However, no acoustic invariant could be found 
for the individual phonemes. In fact. Cooper suggests that the 
perceived similarities and differences between speech sounds may 
correspond more closely to the similarities and differences in articulation 
than to the acoustic signal. As evidence for the simpler relation of 
perception and articulation. Cooper cites the complex relationship of 
the frequency of the burst of a stop consonant to the point of 
articulation: a burst of IhkO cps. is heard as /p/ before /i/ but as 
/k/ before /a/; conversely, bursts at different frequencies can be 
heard as the same consonant. 

In connection with further work with synthetic speech, the Haskins 
group advanced the notion of categorial perception: perception of 
phonemes is different from perception of non-speech stimuli in that 
listeners can discriminate very little better than they can identify 
absolutely. An acoustic continuum is categorized into phonemes by 
listeners but a comparable non-speech continuum is 1 not. Furthermore, 
listeners show discrimination peaks at phoneme boundaries when the 
stimulus is speech, but no such peaks in discrimination appear when 
the stimulus is a comparable non-speech continuum (Liberman, Harris, 
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Kinney, and Lane, 1957) • These results, which are typically most 
clear-cut for stop consonants , are readily explained "by the motor 
theory. It is argued that the gesture used in speech production is 
essentially invariant for the phoneme; therefore, perception is also 
invariant and categorial. 

In their most detailed explication of the motor theory ( Libermap , 
Cooper, Shankweiler , and Studdert -Kennedy, 1967 ) , the Haskins group 
recapitulates the many arguments advanced for the motor theory and also 
specifies at what "level" production is made use of in perception. In 
their earlier work, the assumption was made that the production invariants 
were "motor commands" which were identical for each production of a 
given phoneme. In their latest statement, the idea of motor commands 
is retained and the theory is extended to higher-level neural signals 
which stand in a one-to-one relationship with other segments of the 
language : 

"In phoneme perception. . .the invariant is found far down 
in the neuromotor system, at the level of the commands to 
the muscles. Perception by morphophonemic , morphemic, 
and syntactic rules of the language would engage the 
encoding process at higher levels." (p. ^5^) 

In this form, the motor theory becomes equivalent to analysis- 
by-synthesis, a theory of speech perception dependent on the use of 
rules in just such a way. 

Analysis by Synthesis 

Essentially, analysis by synthesis is a model of perception that 
depends on matching the incoming stimulus to an internally-generated 
pattern. When the internal pattern matches the stimulus, perception 
has been successful. As a model for speech perception, analysis by 
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synthesis has been extensively developed by Morris Halle and Kenneth 
N. Stevens. 

An early version of the model (Halle and Stevens, 1964) is diagrammed 
in Fig. 4. 




STAGE I STAGE II 




SPEECH 

SIGNAL 

Fig. 4. Analysis by Synthesis model. 

(Morris Halle and Kenneth N. Stevens, "Speech Recognition: 
a Model and a Program for Research," i n The Structure 
of Language , ed. by Jerry A. Fodor and Jerrold G. Katz, 
1964, by permission of Prentice-Hall). 

The model depends on two analysis-by-synthesis loops. After a 
spectrum analysis, which in large part is a result of cochlear action, 
the first analysis-by-synthesis loop reduces the spectral representation 
of the acoustic input to a set of phonetic parameters. This is 
accomplished by matching the incoming spectrum to a spectrum produced 
by an internal synthesizer which has the ability to compute spectra 
when given phonetic parameters . In the second analysis-by-synthesis 
loop, the phonetic parameters are transformed to a sequence of phonemes. 
The second loop uses the generative rules that must also be employed 
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in speech production — rules that transform phonemes to phonetic- 
parameters . 

In a more recent statement of analysis-by-synthesis (Stevens and 
iialle, 1965), the analysi s-hy-synthesis model is integrated vith 



linguistic concepts. The n^del is represented in Fig. 5. 




Fig. 5 - Model for the speech-generating and speech-nerception 
process . The dashed line encloses components of a 
hypothetical analysis-by-synthesis scheme for speech 
perception. (K. If. Stevens and M. Halle, '’Remarks on 
Analysis by Synthesis and Distinctive Features,” in 
Models for the Perception of Speech and Visual Form , 

1965, by permission of M.I.T. Press. 

This model also claims that the mechanism employed in speech production 
is the same as the mechanism used in speech perception. Furthermore, the 
model employs abstract representations of vords , coded in terms of 
distinctive features, and phonological rules, apparently identical to 
the rules found in the phonological component of a generative grammar. 

The model operates in the following fashion. The auditory pattern 
derived from the acoustic input undergoes preliminary analysis 3 the 
Spfect nature of preliminary analysis is not specified in this model. 

On the basis of the preliminary analysis and contextual information, a 
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hypothesis is made concerning the abstract representation of the 
utterance. The proposed abstract representation is converted to an 
equivalent auditory pattern and compsf e< ^ ^th the pattern under analysis . 
If there is agreement , then the hypothesi zed abstract representation 
is Judged to be correct, and processing a t more abstract levels can 
proceed. 

The function of the rules i s to convert abstract representations 
to instructions to the vocal tract or to the equivalent auditory 
representation. Thus, these rules are more abstract than the motor 
commands postulated for the motor theory of speech perception. 




The theory of perceptual strategi es has been developed in close 
relation to transformational grammar* Ferce-ptual strategies are 
techniques used by listeners to arri ve a segmentation of a sentence 



into deep structure units and to assi g 11 the proper grammatical function 



to each component. The theory is the result Q f research by M. Garrett, 
J. A. Fodor, and Thomas Bever. At the present, it is in a much more 
fluid state than the other theories di scu ssed so far, so it seems 
appropriate to discuss the development the theory, as well as its 



current status. 

The early statements of the theory (Fodor and Bever, 1965; 
Garrett, Bever, and Fodor, 1966) we re based 0 n the phenomenon of 
click localization: when presented with a sentence with a superimposed 
click, the subject locates the click toward the nearest constituent 
boundary. Furthermore, subjects local i ze clicks correctly primarily 
when they occur on a constituent boundary. This phenomenon is 
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interpreted to mean that surface structure constituents form perceptual 
units, tending to resist interruption by extraneous material. 

In later work, more detailed analysis of perceptual strategies 
followed . Fodor , Garrett , end Bever ( Fodor . and Garrett , 1967 ’ Fodor , 
Garrett, and Bever, 1968) suggest that information about the 
properties of ST^ecific lexical items is employed by listeners. The 
listener selects the verb of the sentence and classifies it according 
to the possible deep structure configurations it can occur with; then 
the listener checks all these possible deep structure configurations 
to see if the surface structure he is presented with Is a possible 
transformational version of the deep structure. In this process of 
selecting possible deep structures, the subject takes advantage of 
surface structure markers; for example. ”to M implies that the verb must 
be able to take a "for... to” complementizer. 

Later work also indicated that surface structure constituents 
* ere not directly related to perception (Bever, Lackner, and Kirk, 
1969). Bather, the units of perception seem to be deep structure units 

The current status of the theory of perceptual strategies , as 
well as a summary of relevant research, has been presented by Bever 
(1970). In this article, Bever rejects the theory of derivational 
complexity. This theory claims that the perceptual complexity of a 
sentence is directly related to the number of transformations involved 
in its derivation. (A theory of analysis-by-synthesis at a syntactic 
level would imply derivational complexity . ) But Bever finds that , in 
many cases, transformations are not related to perceptual complexity. 
First, transformational rules that delete structure do not add 
complexity; second, certain reordering transformations may even 
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simplify perception. For example, (l) is no more complex — and may 
even be simpler — than (2); 

(1) It amazed Bill that John left early. 

(2) That John left early amazed Bill. 

Bever then proceeds to discuss several perceptual strategies 
employed by listeners. Some of these axe the following.. 

a. When faced with a sentence, the listener isolates those adjacent 
phrases of surface structure which could correspond to a sentence in 
deep structure. The listener accomplishes this by segmenting together 
items that could be related as "actor, action, object .. .modifier .” 

b. Unless there is information to the contrary, the first noun... verb 
clause is treated as the main clause. 

c. Constructions are related internally according to semantic 
constraints. Essentially, the listener selects the most likely 
semantic organization. 

d. Any Noun-Verb-Noun sequence that is potentially a unit corresponds 
to "actor, action, object.” 

e. The special properties of function words and verbs are employed. 

There is no need to give a complete list of proposed perceptual 
strategies, since all of them are proposed more or less tentatively. 
The general thrust of the theory, however, is this: to integrate 
perceptual strategies that are discovered to be applicable in language 
with other perceptual $nd cognitive processes, and to determine how 
language i s related to other human cognitive abilities. 





CHAPTER TWO 



THE PERCEPTION OF SUB-PHONEMIC PHONETIC DIFFERENCES 

In the models of speech perception discussed in the nreceding 
chapter, it has been implicitly assumed that phonetic differences that 
are less than phonemic can have no linguistic significance, and that 
such differences can not be of any use to the listener. ("Phonemic” 
is to be understood here as "reliably signaling a difference in 
meaning.") This assumption follows directly from the traditional 
notion of a phoneme as a functional unit, distinct from all other such 
units. This view is also implicit in the notion of "categorial 
perception of phonemes" recently advaneed by workers at Haskins 
Laboratories ( St udder t -Kennedy, Liberman, Harris, and Cooper, 1970). 

On the other hand, phoneticians can develop an ability to notice small 
phonetic differences. And even ordinary listeners are sensitive to 
non-linguistic information that may be carried by sub-phonemic 
differences; for example, in identifying a particular speaker, sub- 
phonemic information is employed. However, speaker identification 
judgments are not linguistic and may be based on a great deal more 
information than on the fine phonetic details of an utterance. 

In order to establish a "baseline" for perceptual units, it would 
be helpful to determine exactly how much use a subject can make of non- 
phonernic phonetic differences for linguistic judgments. 

25 
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A preliminary study related to this question vas conducted by 
D. B. Fry (1968). Fry found that he was able to identify productions 
of the two words lax and lacks with no contextual information provided. 

The experiment was conducted in the following way: Fry prepared a tape 
by splicing copies of one production of lax and one production of lacks 
in random order. He then listened to the tape, and, after hearing each 
word, he pushed a button to identify it. Fry obtained both identification 
scores and reaction time to the two words. He found, to his surprise, 
that he could identify the utterances correctly 96 times out of 100 
(a statistically significant result). Furthermore, he found that the 
reaction time to lacks was faster than to lax , although the difference 
was not statistically significant. 

Fry*s study is quite tentative, so it is not proper to draw a 
generalization from it. Fry tested only one subject, himself, and only 
one supposedly-homophonous word pair. There are a number of possible 
explanations of the results that do not imply that listeners are 
generally aware of sub-phonemic differences. First, Fry is a very fine 
phonetician; therefore, he may be sensitive to distinctions which 
completely escape the ordinary listener. Second, he may have, by chance, 
tested very distinctive productions of the two words; ordinarily, the 
two words may not be nearly so distinctive. Finally, it may be that 
some error in one or the other of the two words made them distinctive 
but not in a linguistic sense — there may have been some extraneous 
noise on the original recording of the utterance. 

However, Fry*s finding, if it reflects a general listener ability, 
has considerable implications for theories of speech perception. 

Therefore, it seemed desirable to replicate Fry*s experiment with control 
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over the variables mentioned above. 

Method 

Stimuli : Ten pairs of words were selected, each pair consisting of 

one monomorphemic and one bi-morphemic word of the same phonemic 
shape. Each pair of words composed a sub-list; within the sub-list, 
the two words were recorded in random order, each word appearing ten 
times . Each sub-list was introduced by two sentences in which the two 
words to be tested appeared in context. The following word pairs were 
tested: w ade/ weighed , hose/ hoes , bard/ barred , pact/ packed , lax/lacks , 
b aste/based , ad ds/adze , mist/ miss ed, laps /lapse , and guest / guesse d . 

The speaker was a male graduate student, a speaker of General American, 
whose home is in Connecticut. 

The following procedure was employed to record the stimulus tape: 
for each production of each word to be recorded, the speaker was 
presented with a sketch picturing an activity suggestive of the word; 
underneath the sketch was a sentence employing the word, and 
descriptive of the sketch. The speaker was certain that under these 
circumstances he could produce the "correct word.” 

Two stimulus tapes were recorded; the second tape was a counter- 
balanced version of the first tape. On both tapes, words within lists 
were separated by five seconds; sub-lists were separated by ten 
seconds. Both tapes were recorded in a sound-proof recording booth, 
on an Ampex 350 tape recorder, at 7 1/2 i.p.s. 

Subject s : Two groups of subjects participated in the experiment: 17 

undergraduate students with no training in phonetics, and 12 graduate 
students in an introductory or advanced phonetics class. 
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The subjects were informed that the purpose of the experiment 
was to determine how quickly and how accurately people could identify 
words that sound very much the same. The subjects were instructed 
to respond as quickly as possible and to guess if they did not know 
which word they heard. 

Procedure : The instrumentation is described in the accompanying diagram 



Each subject listened to the stimulus tape over earphones; he responded 
to each word by pushing one of two buttons, which were labeled, to 
identify which word he heard. The buttons were connected to two signal 
generators, one generating a sine wave, the other a square wave. Both 
the stimulus tape and the subject*s response were recorded on a two- 



channel tape recorder (Ampex 35*0 i.p.s. Thus both the reaction 



time sind the response were available for later analysis. Each subject 
responded to one complete list of 200 utterances. After the test, each 
subject was asked which pairs of words he felt he did well on and 
which pairs he felt he could not tell apart. 



(Fig. 6). 



tape 

recorder 



n 



earphones 



2-channel ( tape 
recorder 
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button 
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channel 2 1 channel 1 



wave 



wave 



generator) [generator 



Fig. 6. Instrumentation for experiment testing the 
perception of sub-phonemic phonetic differences. 
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The tapes of each subject’s performance were analyzed by 
computer. First, the voltages on each tape were digitized on a 
Radiation Inc. Analog Data Conversion System 152. The Ohio State 
University Instruction and Research Computer Center’s IBM S /360 Mod 
75 computer was used for further processing. The computer was programmed 
to determine changes in voltage. The transition from silence to 
voltage on the response channel was interpreted as the beginning of 
a response. The response was then categorized as either a sine wave 
or a square wave. The second channel containing voice was scanned to 
determine the transition from silence to voltage. This was construed 
as the beginning of a signal. The difference between the beginning 
of the signal and the beginning of the response was considered to be 
reaction time.^ 



^Measuring reaction time to speech stimuli, which exist in 
time, presents a problem not encountered with measuring reaction time 
to visual stimuli, namely at wliat point the subject can be said to 
begin to respond. The subject may begin to respond during the 
presentation of the word or after he has heard the entire word. On 
the other hand, reaction time can be measured either from the begin- 
ning or the end of the word. For this experiment, I have chosen 
to measure reaction time from the beginning of the word, in full 
awareness that either decision creates difficulties. 



However, because of technical difficulties with the recordings. 



O 

ERIC 



not all responses by every subject could be recovered. 

Results 

I dentification : The over-all scores, given in Table 1, indicate 

that subjects do not seem to be able to identify the words correctly 
at significantly above chance levels. These results are presented 
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graphically in Fig. 7. Furthermore, 
to perform significantly differently 
subjects. 



phonetics students do not seem 
from phonetically untrained 



WADE / 
WEIGHED 
HOSE/ 
HOES 
BARD/ 
BARRED 
PACT/ 
PACKED 
LAX/ 
LACKS 
BASTE/ 
BASED 
ADDS/ 
ADZE 
MIST/ 
MISSED 
LAPS/ 
LAPSE 
GUEST/ 
GUESSED 




55 . 1 * 



30 kO 50 6 0 70 75 Per Cent 



Fig. 7. Per Cent correct identifications for each word pair. 

W^en the responses of the subjects to each production are analyzed. 

however, it appears that subjects are very consistent in their responses 
to some of the test items. Clearly consistent judgments (significant 
at .02 level or higher) for at least one production were obtained for 
the following pairs tested: we ighe d/ wade , barred/ bard, lax/ l acks , 
baste /based, and mist/ m issed . Two pairs tested did not produce any 
significant agreement among subjects: hose/hoes and lapse /l aps . Three 
pairs may or may not be considered significant; in each of these pairs, 
agreement in responses was reached for four productions at a .05 
level of significance. 
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CONSISTENCY OF SUBJECTS * RESPONSES 
PM CENT OF S AGREEING -IN RESPONSE B 
(underlined scores are significant at .02 level) 



List A 



productic 

number 


>n 

wade 


hose 


bard 


pact 


lax 


baste 


adds 


mist 


lapse 


guest 


1 


61.5 


16.7 


66.7 


51*. 5 


1+6.2 


85-7 


53.3 


100.0 


5**-5 


25-0 


2 


53.8 


69-2 


69.2 


36.1+ 


33.3 


33.3 


50.0 


30.0 


81.8 


1+1.7 




1+1.7 


66.7 


23.1 


81.8 


23.1 


1+2.9 


58.3 


50.0 


5**-5 


58.3 


1+ 


53.8 


61.5 


76.9 


36.1+ 


50,0 


1+2.9 


33.3 


1*0.0 


81.8 


66.7 


5 


50.0 


76.9 


58.3 


81.8 


33.3 


50.0 


1+1.7 


50.0 


36.1+ 


1*5-5 


6 


53.8 


50.0 


58.3 


51*. 5 


25.0 


50.0 


1+1.7 


60.0 


36.1* 


1*5-5 


7 


50.0 


53-8 


15.1* 


70.0 


1+6.2 


57-1 


5 0.0 


66.7 


1*5.5 


58.3 


8 


*1+6.1 


69-2 


75.0 


51*. 5 


61.5 


66. 7 


50.0 


70.0 


72.7 


1*1.7 


9 


1*5-5 


1+6.1 


38.5 


81.8 


53.8 


71.1+ 


16.7 


60.0 


63.6 


66. 7 


10 


38.5 


1+6.1 


61.5 


81.8 


61.5 


66.7 


75.0 


20.0 


1*5.5 


50.0 


11 


69-2 


76.9 


30.8 


30.0 


18.2 


18.2 


33.3 


30.0 


1*5.5 


1*1.7 


12 


58.3 


66.7 


30.8 


50.0 


53.8 


72.7 


75.0 


70.0 


50.0 


58.3 


13 


38.1+6 


1+6.1 


61.5 


27-3 


53.8 


50.0 


3 3 -3 


66.7 


5*. 5 


16.7 


lh 


30.8 


1+6.1 


81*. 6 


63.6 


1+1.7 


61.5 


51*. 5 


50.0 


1*5.5 


50.0 


15 


63.6 


1+6.1 


mTT 


1*5-5 


30.8 


78.5 


58.3 


50.0 


27.3 


58.3 


16 


69.2 


58.3 


1+6.1 


5l*-5 


50.0 


58.3 


25.0 


80.0 


5 1 *- 5 


58.3 


IT 


69.2 


53-8 


61.5 


63.6 


1+6.2 


28.6 


83.3 


55-5 


1*5.5 


66.7 


18 


33.5 


61.5 


1+6.1 


1*5.5 


1+6.2 


57.1 


58.3 


1+0.0 


51*. 5 


58.3 


19 


1+6.1 


61.5 


15.1* 


1*5.5 


58.3 


78.5 


58.3 


1+0.0 


63.6 


50.0 


20 


61.5 


161.5 


38.5 


63.6 


38.5 


33.3 


1+1.7 


70.0 


51*. 5 


50.0 


■ List B 






production j 




















number 


wade 


hose 


bard 


pact 


lax 


baste 


adds 


mist 


lanse 


guest 


i 


63.6 


1*5-5 


27.3 


60.0 


58.3 


16.7 


75-0 


63.6 


71.1* 


1+2.9 


2 


51*. 5 


55-5 


1+5,5 


20.0 


50.0 


36.1+ 


50.0 


1*5-5 


28.6 


69.2 


3 


63.6 


55-6 


51*. 5 


55-6 


50.0 


58.3 


25.0 


63.6 


50.0 


61.5 


1+ 


5^-5 


50.5 


18.2 


70.0 


51*. 5 


33.3 


25.0 


63.6 


50.0 50*0 


5 


60.0 


27-3 


66-7 


50.0 


1+5.6 


75-0 


75-0 


51*. 5 


71.1* 


57.1 


6 


51+.5 


60.0 


6 0.0 


55-6 


63.6 


50.0 


16.7 


36.1+ 


57.1 


21.1+ 


7 


36.1+ 


36.1+ 


30.0 


50.0 


1+1.7 


66.7 


50.0 


50.0 


1*2.9 


61+ .3 


8 


51*. 5 


81.8 


10.0 


1+1+.1+ 


1+5.6 


50.0 


1+1.7 


50.0 


6l.5 


69.2 


9 


51+.5 


10.0 


50.0 


77.8 


83.3 


20.0 


83.3 


1*5-5 


1*2.9 : 


28.6 


10 


100.0 


1*5-5 


1*5.5 


60.0 


1+1.7 


51*. 5 


50.0 


72.7 


61*. 3 


35-7 


11 


66.7 


55.5 


63.6 


60.0 


51+.5 


25.0 


58.3 


70.0 


50.0 ; 


28.6 


12 


50.0 


70.0 


36.1* 


60.0 


36.1+ 


50.0 


36.3 


27.3 


50.0 ; 


21.1+ 


13 


36.1+ 


1*5-5 


63.6 


25.0 


66.7 


1+1.7 


5**-5 


27-3 


30.8 * 


1+2.9 


lh 


63.6 


36.1+ 


30.0 


50.0 


33.3 


75-0 


58.3 


81.8 


1+6.2 ; 


21.1+ 


15 


6 0.0 


55.6 


18.2 


30.0 


33.3 


1*5-5 


50.0 


27.3 ' 


71.1* < 


61.5 


l6 


63.6 


55.6 


51*. 5 


50.0 


1+1.7 


33-3 


5**-5 


27.3 ; 


28.6 : 


53.8 


IT 


51+.5 


63.6 


30.0 


57.1 


66.7 


66.7 


1*5-5 


1*5.5 1 


61+ .3 < 


d!+.3 


18 


18.2 


1+0.0 


36.1+ 


70.0 


22.2 


25.0 


50.0 


1*5.5 : 


57.1 1 


^2.9 


19 


72.7 


27.3 


1+0.0 


55-6 


63.6 


72.7 


63.6 


63.6 - 


1+2.9 : 


57.1 


20 


72.7 


60.0 , 




20.0 


66. 7 


50.0 


25.0 


63,4-.: 


3 S,2_J 
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The consistency of subjects* responses is represented in Table 

2. 



Even when subjects are highly consistent in agreeing on a particular 
response, they do not necessarily identify the word correctly; the 
identification scores for utterances for which subjects agree on one 
response (at .02 level) are still at chance level (57# correct). 

Subject Interview : The mean identification score for the word pair 

judged easiest and for the most difficult word pair was calculated. 

The score represents each subject* s performance in relation to his 
judgment of ease and difficulty, and thus does not represent performance 
on any one word pair. The differences found were not statistically 
significant, but did lie in an interesting direction: both phonetically 
trained and phonetically untrained subjects performed better on the 
word pairs they considered easy than on the word pairs they considered 
difficult. 



TABLE 3 

SUBJECTS* PERFORMANCE IN RELATION TO JUDGMENTS 
OF EASE AND DIFFICULTY 





Word Pair Judged 
Easiest {% Correct) 


Word Pair Judged Most 
Difficult (% Correct) 


All Subjects 


53.10 


46.01 


Phonetics Students 


51.60 


49.20 


Phonetically Untrained 


54.10 


43.80 


Students 







Furthermore, subjects show a fair amount of agreement in judging 
which nairs of words are difficult and which are easy. Table 4 shows 
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the number of times each word pair was judged easy and the number of 



times each word pair was judged difficult. 

TABLE h 

EASE AND DIFFICULTY OF WORD PAIRS AS JUDGED BY SUBJECTS 




Word pair 


Number of times judged • 
easy 


Number of times judged 
difficult 


wade/weighed 


6 


h 


hose/hoes 


3 


7 


bard/barred 


5 


1 


pact /packed 


1 


2 


lax/ lacks 


1 


3 


baste/based 


3 


2 


adds /adze 


1 


5 


mist/missed 


3 


0 


laps /lapse 


3 


3 


guest/guessed 


3 


1 



Reaction time : Reaction time was not determined for all subjects. 

As Tables 5 to 8 show, reaction time was quite slow for all subjects 
and to all word pairs. There is no significant systematic difference 
in reaction time between correct and incorrect responses. 

Reaction time to productions labeled consistently is quite 
variable. When the reaction time to consistently labeled productions 
is compared with the mean reaction time for that word pair, the 
differences in reaction time are in no way systematic. When the 
differences are statistically significant, however, then reaction time 
is longer to the consistently labeled production. These data are 
presented in Table 9. 

When reaction time to mono-morphemic and to bi-morphemic words 
is examined, there is some tendency for reaction time to be shorter 
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TABLE 9 

REACTION TIME, IN SECONDS, TO PRODUCTIONS LABELED CONSISTENTLY 
(reaction times significantly different from mean are underlined) 
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to the bi -morphemic word, as Fry discovered. The differences, however, 
are not statistically significant. These data are presented "in 
Tables 10 to 12. 

Acoustic analysis : In order to discover the acoustic cues that subjects 
were employing to arrive at consistent labeling, spectrograms were 
made of all productions that were labeled consistently. Spectrograms 
were also made of some productions for each word pair that were labeled 
at random, and of the production that immediately preceded the consis- 
tently labeled production. Spectrograms were made on a Kay Electric 
Company Sonagraph. 

It was found that subjects were employing two types of cues: 
slight differences in consonant quality and differences in vowel 
duration. For the word pairs b aste/based , mist/missed , and lax/ 
lacks, subjects were responding to a slight difference in the fricative 
Cs3. The consistently labeled productions had more energy, at all 
frequencies, in the fricative than the productions that were labeled 
at random. 

The word pairs wade/weighed and bard/barred were labeled 
consistently on the basis of vowel duration. However, subjects 
apparently were not responding to absolute differences in vowel 
duration, but to the duration of a vowel compared to the duration 
of the vowel of the preceding production. Thus a production CbsjdH 
would be labeled barred if it followed a production with a perceptibly 
shorter vowel; it would be labeled bard if it followed a production 
with a perceptibly longer vowel. It did not matter whether the 
word was intended as "bard" or "barred. 1 * 
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Discussion 

To a great extent, the results of this experiment are negative. 
Subjects can not identify the word pairs correctly. They do not 
perforin better on the word pairs they consider easy than on the word 
pairs they consider difficult. And no inferences can be drawn from 
the reaction time except that, because the reaction time is very 
slow, the subjects find it difficult to decide which word they have 
heard . 

However, subjects seem to be aware of at least some sub-phonemic 
information since they label some word pairs consistently,- even 
though not correctly. Faced with the task of the experiment, subjects 
develop a strategy for making use of fine phonetic detail. In this 
manner they arrive at some consistent labelings. But since the 
identifications based on this strategy are equally likely to be correct 
or incorrect, the strategy can not be considered to be part of 
ordinary speech perception. 

Thus the results of the experiment imply that even though subjects 
may become aware of sub-phonemic differences, they do not know what 
linguistic use to make of them. 
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CHAPTER THREE 



THE PERCEPTION OF OBSTRUENT CLUSTERS 

Studies dealing with the perception of order of non-speech sounds 
indicate that perceiving the order of sounds of short duration is 
quite problematic. Hirsch (1959) reported that, after considerable 
practice, subjects could perceive the order of two sounds correctly 
if the onset of the sounds was separated by 15 to 20 msec. For 
stimuli, Hirsch used tones and bursts of noise 500 msec, in duration 
as well as clicks. Hirsch concludes that the minimal temporal interval 
required for perception of order is independent of the duration of the 
sound (within the limits of the experiment) and of the quality of the 
sound. 

Broadbent and Ladefoged (1959) found that, at first, subjects 
could not perceive the order of sounds unless the onset of the sounds 
was separated by 150 msec.; with considerable training, a 30 msec, 
separation became adequate for accurate perception of order. Broadbent 
and Ladefoged used three different stimuli: a "hies," high frenuency 
noise of 120 msec, duration; a "pip," an 800 cps sine wave of 30 
msec, duration, and a "buzz," a 171 cps square wave of 30 msec, 
duration . 

Both these experiments involved the perception of the order of 
only two elements. However, the task is much more difficult when the 

*+5 
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subject has to determine the order of three or more elements. Several 
experiments involving the ordering of more than two sounds are 
reported by Warren and Warren (1970). In the first experiment, 
subjects were asked to determine the order of three sounds — a hiss, a 
tone, and a buzz, each lasting 20C msec.— wh'ch were repeated over 
and over without pauses. The subjects performed no better than chance. 
When the order of four sounds — a high tone, a low tone, a buzz, and 
a hiss, each lasting 200 msec. — was to be judged, the duration of each 
item had to be increased to between 300 and J00 msec, for half of the 
subjects to identify the sequence correctly. In the last experiment, 
the subjects were asked to judge the order of four 200 msec, vowel 
segments , cut from productions of extended vowels and snliced together 
without pauses. The subjects performed no better than chance. 
Identification of order became possible only when a 50 msec, silent 
interval was introduced between the vowels. 

These experiments show that subjects have considerable difficulty 
in perceiving the order of sounds. However, listeners have no 
comparable difficulty with the order of elements in perceiving speech, 
even though many speech sounds are of quite short duration. Words 
like tax and task , ax and ask are normally perceived correctly, even 
though the duration of the consonants in the cluster is close to the 
minimum discovered in the Hirsch experiment. A reasonable estimate 
of the duration of n, ;t, and k is 51 msec., 30 msec, and 36 msec., 
respectively (Lehiste, 1970). These figures are derived from Estonian 
short voiceless stops. 

It is, of course, a common observation that children have 
difficulty with such clusters; aks_ is a very common child pronunciation 



1;7 

of asit, for example. And historically, such clusters have been 
pror.e to metathesis.^" Stil2 , adults seem to have no trouble with 

^"It may be that the sporadic occurrence of metathesis, ^ouna 
in historical change, could be better explained by examining errors 
in perception, rather than errors in production, which has been the 
traditional starting point for discussing language change. 

these clusters in the ordinary use of speech. 

The observation that children have trouble with oostruent clusters 
but adults do not could imply that the adults’ proficiency is a result 
of considerable practice. Both the Broadbent and Ladefoged, and 
liirsch experiments show that the perception of order improves with 
practice. Analogously, the adults’ proficiency could be a result of 
practice acquired in the course of language learning . However, it 
is also possible, and has been suggested by a number of theorists, 
that some special mechanisms are employed in the r^erception of consonant 
clusters. Thus Broadbent and Ladefoged report that the ir trospective 
feeling, developed in judging order, is that two items become 
differentiated on the basis of over-all quality rather than order. 

They suggest that the perceptual mechanism operates on discrete samples 
of perceptual information; when two items fall into the same 
sample tneir order has to oe inferred on some other basis. On the basis of 
the Broadbent and Ladefoged and Hirsch experiments, Neisser (19^7) 
argues that a listener gradually learns to distinguish a cluster like 
t_s from a cluster like s_t, rather than perceiving a sequence of t 
followed by s, or s followed by t. He implies that such clusters 
are perceptual units to the listener, not normally analyzed further. 
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Wickelgren f s idea of context sensitive coding, presented in detail 
in Chapter One (Wickelgren, 19 69a, 1969b), can also explain the 
fact that adults easily perceive a sequence of consonants correctly. 
When a listener is presented with a consonant cluster, e.g. sic, he 
knows that it is composed of two elements , hut he does not encode 
these elements in order; rather, the cluster is coded as an unordered 
sequence, with each element identified for what precedes and follows it. 
Schematically, the coding would be something like the following: 
g k y ^s^.. These elements can be assembled in the correct order, and 
the listener can arrive at the intended sequence. 

The perception of obstruent clusters is an interesting problem 
for empirical study, particularly since it is related to the almost 
universally accepted notion that the minimal unit in speech perception 
is the phoneme. Both Keisser’s suggestion and Wickelgren* s theory, if 
substantiated, would argue against this view. 

An experiment was designed to investigate the perceptual mechanisms 
employed in the perception of obstruent clusters . By observing the 
pattern of confusions of obstruent clusters in the presence of noise, 
it is possible to make some inferences about the perceptual mechanisms 
underlying the perception of these clusters . 

Method 

Stimuli : Fifteen pairs of English words were selected which differed 

from each other only in the order of obstruents in a cluster. Five 
pairs of words ended in the obstruent cluster ps/sp ; five ended in 
ts / st ; five ended in ks / sk. For each obstruent cluster, there was 
one pair of two-syllable words; in addition, each obstruent cluster 





appeared at least ace with no morpheme boundary in th^ cluster. The 
full list of words is reproduced below: 



apse 


Blatz 


ax 


asp 


blast 


ash 


lips 


mats 


tax 


lisp 


mast 


task 


Cajisian 


blit ser 


axing 


Caspian 


blister 


asking 


claps 


boots 


Max 


clasp 


boost 


mask 


raps 


coat ^ 


bricks 


rasri 


coast 


brisk 



Three lists were constructed. On each list, each word appeared 
two times in random order; the order was arrived at by using a table 
of random numbers . Thus each list consisted of 60 words ; each consonant 
cluster appeared on each list ten times. 

The speaker was a male, with a medium-pitch voice, from Akron, 

Ohio. Before recording, the speaker practiced for some time so that 
he could produce the stressed vowel of each word at a constant intensity. 
This was accomplished by monitoring the v.u. meter on the tare recorder. 
When the speaker was producing the words at a constant intensity, the 
actual recording was made, monitoring each production to keep the 
intensity at a constant level. The three lists were recorded in a 
sound-proof recordinr: booth on an Annex 350 tape recorder, at 7 1/0 
i.p.s. Words were separated by 2.5 seconds; after every five words, 
there was a gap of 5 seconds . 

The stimulus tape was made by re-recording the master tape while 
adding ’’white" noise produced by a Grayson-Stadler noise generator. 
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The instrumentation is shown in the accompanying diagram (Fig. 8). 




Fig. 8. Instrumentation for adding noise to stimulus tape. 
Three different signal -to-noise ratios were employed for the three 
lists: the first list was re-recorded at a signal -to-noise ratio of 0 

d.b.; the second list was re-recorded at a signal-to-noise ratio of 

/ 

+12 d.b.; the third list was recorded at a signal-to-noise ratio of 
-6 d.b. 

Subjects : Nineteen subjects participated in the experiment. All 

were members of The Ohio State University linguistics department and 
native speakers of English. 

Procedure : The experiment was conducted as a listening test. Before 

the test, subjects were instructed to write what they heard, and to 
guess if necessary; they were told to expect some unusual words, and 
these words were shown to them. For the test, the stimulus tape was 
played on a tape recorder while the subjects listened over earphones, 
and wrote what they heard on an answer sheet. Each subject listened 
to the entire tape (3 lists), and thus responded to 180 stimulus words. 

In addition, five subjects took the test a second time. In the 
second test, the listening conditions were identical to those of the 
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first test, but the subjects were instructed to say what they heard. 
The subjects' jpoken response and the stimulus tape were recorded 
on separate channels of an Ampex 35^ tape recorder. 

The subjects' responses were tabulated in the form of confusion 
matrices. The answers were scored only for the perception of the 
obstruent clusters . Thus , i T the stimulus word was r aps , bm, the 
subject wrote laps, he was scored correct. 

The response tapes of the five subjects who gave spoken responses 
were processed by an Elema-Schonander Mingograf, each channel of the 
tape being represented as an oscillogram on % separate channel of the 
Mingograf. The paper speed was 100 mm/sec. 

Reaction time was determined by measuring from the onset of the 
stimulus word to the onset of the response, and from the end of the 
stimulus word to the onset of the response. There was no difficulty 
in measurement when the signal-to-noise ratio was +12 d.b. When the 
signal-to-noise ratio was 0 d.b., measurements from the stimulus word 
had to be made from the vowel rather than from the consonants . 

Reaction time could not be determined when the signal-to-noise ratio 
was -6 d.b. 

Results 

Confusions : The results are presented in the accompanying confusion 

matrices (Tables 13 to 51). Each cell of the matrices shows the 
number of times the stimulus consonant cluster, given at the beginning 
of the row, was identified as the consonant cluster given in the 
column heading. Correct responses lie on the diagonal. In addition. 
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the percent of all the responses of each row that lie in a particular 
cell is given for each cell. A. I. (articulation index) gives the 
ratio of correct identifications for each matrix. 

Tables 13 to 15 give confusion matrices for all responses. As 
is to be expected, the higher the noise is, in relation to the signal, 
the more confusion errors occur. It can be observed that , for all 
consonant clusters, the most common error is a reversal of the 
consonant cluster. Furthermore, the stop-fricative cluster is 
perceived correctly more often than the corresponding fricative-stop 
cluster. This effect may result from the higher frequency of stop- 
fricative clusters in English. 

The pattern of confusions for written responses (Tables l6 to 
18) and for spoken responses (Tables 19 to 21 ) is essentially the 
same. Thus, there is no advantage to spoken responses, and spoken 
responses do not produce a different pattern of confusions. 

Tables 22 to 27 present the confusion matrices for two-syllable 
words. The articulation index is slightly higher for two-syllable 
words, but the confusion patterns remain essentially the same. There 
is some tendency to confuse £ and k clusters only with each other, 
and not with t_ clusters; however, this is probably due to other 
differences in the two-syllable words tested, i.e., a different vowel 
and a different, final consonant. 

Tables 28 to J+5 present confusion matrices for all test words 
with a given vowel. The most common confusion, for all vowels, is 
still a reversal of the consonant cluster. There is only one exception 
to this tendency; when the vowel is CX3 , £ clusters tend to be 
confused with t clusters about as much as with each other. 
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TABLE 13 

ALL RESPONSES — SIGNAL TO NOISE RATIO: +12 d.b. 
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AI: .8599 





TS 


ST 


PS 


SP 


KS 


SK 


TS 


81 . 9 

181 


9 

20 


2.3 

- 3 _ . 




1.1+ 

3 


5-1+ 

12 


rvn 
u l 


6.5 

15 


78.8 

182 




1.3 

3 


.9 

2 


12.5 

29 


PS 


1.7 

4 


.8 

2 


86.2 

206 


8.1+ 

20 


2.5 

6 


.1+ 

1 


SP 


.1+ 

1 


3.5 

8 


3.5 

8 


77-3 

177 




15-3 

35 


KS 




.1* 

1 


1.3 

3 


.9 

2 


95-2 

219 


2.2 

5 


SK 


1.3 

3 




1.3 

3 


.8 

2 


J 

1 


96.2 

226 



TABLE ill 

ALL RESPONSES— SIGNAL TO NOISE RATIO: 0 d.b. 

AI: .U896 





TS 


ST 


PS 


SP 


KS 


SK 


TS 


5^.3 

109 1 


33-3 

67 


2.9 

6 


.5 

1 


•9 

1 


7-9 

16 


ST 


1+5-3 

87 


39-6 

76 


5-2 

10 


• 5 

1 


.5 

1 


8.9 

17 


PS 


10.2 
19 


7 

13 


hh.6 

83 


27.1* 

51 


3-3 

6 


7-5 

Ik 


SP 


11+.5 

. - - 27 _ _ 


7 

13 


23.7 

1+1+ 


37-1 

65L 


I*. 3 

8 


13. k 
25 


KS 


9.1 

16 


3.1* 

6 


1.7 

3 


2.8 

5 


67.6 

119 


15-1* 

27 


SK 


7-7 

13 


2.9 

5 


11*. 3 

2k { 


5X 

.. 9. , _ 


17-9 

30 


51.6 

.^L 
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TABLE 15 

ALL RESPONSES — SIGNAL TO NOISE RATIO: -6 d.b. 
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AI: .3874 





TS 


ST 


PS 


SP 


KS 


SK 


TS 


5T.2 

88 


29.2 
45 . 


1.9 

3 


1.3 

2 


5-8 

9 


4.6 

7 


ST 


55-7 

78 


32.2 

__4. 5 ... 


1.4 

2 


1.4 

2 


5 

7 


4.3 

6 


PS 


10.4 

18 


4.6 

8 


36.4 

63 


32.5 

46 


9.2 

16 


6.9 

12 


GP 


19.7 

29 


13.6 

20 


13.6 

20 


29-9 

44 


11.6 

17 


11.6 

17 


KS 


8.5 

12 


3.5 

3 __ _ _ 


9-2 

13 


7.1 

10 


38.4 

54 


33.3 

47 


SK 


4.9 

— 1 


2.9 

3 


14.7 

15 


15-7 

l6 


24.5 

25 


37.3 

38 
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TABLE 1 6 

ALL WRITTEN .RESPONSES — SIGNAL TO NOISE RATIO: +12 d.b. 

AI: .8529 





TG 


ST 


PS 


SP 


KG 


r 


: K 


1 

i 

1 


VG 


8iVu” 
139 . 


lF.5* " 

18 


2.3 

k 




2 


1.2 


8 


. O 


i 


- ‘-'96 . - 


7-7 

lk 


77.9 

lkl 




3 


1.7 


2 


1.1 


21 


11.6 




| PS 


1.6 

„ X _ 


1.1 

2 


85 

160 


18 


9.6 


k 


2.1 


1 


. 6 


I 

! 

i 

— 1 


1 

_SP __ 


.6 

1 


2.8 
3 . 


3.9 

7 


137 


76.9 




28 


15.8 


» 

1 

1 

i 


_ KS_ _ 




.6 

1 


1.6 

3 


2 


1.1 


169 


9k. 5 


k 


2.2 


i 

1 


_ Gk___ 


----- 

3 




1.7 

3 


1 


• 5 

.. 


1 


.5 


176 


95.6 





TABLE IT 

ALL WRITTEN RESPONSES — SIGNAL TO NOISE RATIO: 0 d.b. 

AI: .5006 



TG 


TS 


ST 


PS 


CP 


KG 


GK 


55.5 

87 


32.5 

. 5.1 __ _ 


3.1 

5 


. b 

1 


1.3 

2 


7 

11 


__ ST 


“ kk.5 
65 


kl.l 

GO 


5.5 

8 


• 7 

1 


H 

L 


7-5 

11 




7.9 


7.9 


k2.5 


29.5 


2.9 


9.3 


PS 


11 


17 


. 59 


kl 


k 


13 




11. k 


9.2 


2k. 8 


36.9 


k .9 


12.8 


SP 


16 


13 


35 


1 

C\J 

ir\ 


7 


18 




8.9 


3.7 


2.2 


1.5 


68 . 6 


lk .9 


ICS 


12 


r _ _5 __ 


3 


2 


92 


20 




5.6 


3.1 


11.1 


5.6 


17.5 


57.1 


SK 


1 

L 7 _ 


k 


lk 


7 


22 


72 



O 

ERIC 
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TABLE 18 

ALL WRITTEN RESPONSES — SIGNAL TO NOISE RATIO: -6 d.b. 

AI: .4121 



TS 



ST 



PS 



SP 



Kf 



SK 



_ TS 

_ ST ^2_ 



PS 

ST_ 

KS 

SK 



11 



l6 



8 



56. T 



51 



8.86 

15.3 

8.2 

~5~ 



34 

4l 



14 



36 

40.2 

5.64 



13.3 



4.1 

TX 



Jsl 



IT 



11 



.0 



1.8 



.98 

TmT 

I6T2" 



11.2 



14.5 



Ji. 



36 



7 



11 



1.8 

~i796 

31.4“ 
34". 3 



7.2 



15.9 



8 



12 



13 



37 



14 



7.2 

*2.94 

~ V ? i ~ 

12JT 

37.8 



20.3 



*3 

-J 



8 



9 



31 



27 



1.8 

" 2.94 



8.6 



31.6 



39.2 
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TABLE 19 

TOTAL SPOKEN RESPONSES — SIGNAL TO NOISE RATIO: +12 d.b. 

AI: .882+9 



57 





mo 
1 O 


ST 


PS 


SP 


KS 1 


SK 


TS 


Qk 

2*2 


2+ 

2 


2 




2 

1 


8 

2 * 


ST 


2 

1 


82 

2*1 








16 

8 


PS 


1.9 

1 




90.2* 

2*6 


3.9 

2 


3.9 

2 




SP 




5.9 

3 


1.9 

1 


78.5 

2+0 




13.7 

7 


KS 
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TABLE 20 

TOTAL SPOKEN RESPONSES— SIGNAL TO NOISE RATIO: 0 d.b. 

AI: .2*51*8 
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TABLE 21 

TOTAL SPOKEN RESPONSES — SIGNAL TO NOISE RATIO: -6 d.b. 

AI: .3266 
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TABLE 22 

WRITTEN RESPONSES FOR TWO-SYLLABLE WORDS 
SIGNAL TO NOISE RATIO: +12 d.b. 

( blister/blit zer, Capsian/Caspian, axing/asking) 



AI: .9598 
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TABLE 23 

WRITTEN RESPONSES FOR TWO-SYLLABLE WORDS 
SIGNAL TO NOISE RATIO: 0 d.b. 



( blister /blitzer, Capsian/Caspian, axing/asking) 

AI : . 5730 
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TABLE 2k 

SPOKEN RESPONSES FOR TWO-SYLLABLE WORDS 

SIGNAL- TO NOISE RATIO: +12 d.b. 
(blister/blitzer, Capsian/Caspian, axing/asking) 

AI: .95 





TS 


ST 


PS 


SP 


y 

KS 


SK 


TS 


100 

10 












ST 




100 

10 










PS 






100 

10 








SP 






10 

1 


90 

9 






KS 










90 

9 


10 

1 


SK 








10 

1 




90 

9 



TABLE 25 

SPOKEN RESPONSES FOR TWO-SYLLABLE WORDS 
SIGNAL TO NOISE RATIO: 0 d.b. 
(blister/blitzer, Capsian/Caspian, axing/asking) 

AI: ..636 
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TABLE 26 

WRITTEN RESPONSES FOR TWO- SYLLABLE WORDS 
SIGNAL TO NOISE RATIO: -6 d.b. 

(blister /blit zer, Capsian/Caspian, asking/ axing) 

AI: . k52k 
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TABLE 2J 

SPOKEN RESPONSES FOR TWO-SYLLABLE WORDS 
SIGNitii TO NOISE RATIO: -6 d.b. 
(blister/blitzer , Caps ian/Caspian , axinpi/askinr) 

AI: ' .3953 





rpM 
1 1 j 


ST 


PS 


SP 


KS 


SK 


80 

k 


20 

1 












ST 


100 

5 












J'S_ 

SP 






hk.h 

h 


hk.k 

k 




11.1 

1 


io 

1 


20 

2 


10 

1 


50 

5 




10 

1 


T 

IvO 








20 

1 


20 

1 


6 o 
3 


I r* v 
[ . 1 






11.1 

1 , 


33.3 

3 


22.2 

2 


33.3 

3 



ERiC 



76 



63 



I 



1 

A 

K. 



■ 

\ 

V 



TaBbE 26 

SPOKEN RESPONSES FOR Clp —SIGNAL TO I70ISE RATIO : +12 d.b 
(blister/blitzer , lips/lisp, brisk/bricks) 

AI: .9000 
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TABLE 29 

SPOKEN RESPONSES FOR Co?:— SIGNAL TO NOISE RATIO: +12 d.b. 
(mats/mast, Blatz/blast, ax/ask, apse/asp. Max/mask, tax/task, rans/ 
rasp, claps/clasp, Capsian/Caspian, askinp/axinp;) 

AI: .8683 
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TABLE 30 

SPOKEN RESPONSES FOR Cu 3 'AND Catf3 — SIGNAL TO NOISE RATIO: +12 d.b 

(coats/coast, boots/boost) 

AI: .9E87 
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TABLE 31 6h 

SPOKEN RESPONSES FOR TO— SIGNAL TO NOISE RATIO: 0 d.b. 
(blister /blitzer, lips/lisp, bricks/brisk) 

AI: .1^655 
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TABLE 32 

SPOKEN RESPONSES FOR C® d~ SIGNAL TO NOISE RATIO: 0 d.b. 
(matz/mast, Blatz/blast, ax/ask, apse/asa, Max/mask, tax/task, rams/ 
rasp, claps/clasp, Capsian/Caspian , askinp/axinp) 

AI: ~.UUl2 
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TABLE 33 

SPOKEN RESPONSES FOR Cud AND Coil’d — SIGNAL TO NOISE RATIO: 0 d.b. 
(coast/coats, boost/boats) 

AI: .5000 




O 

ERiC 



78 



65 



TABLE 3*+ 

SPOKEN RESPONSES FOR Cx3 — SIGNAL TO NOISE RATIO: -6 d.b. 
(blister/blitzer , lips/lisp, bricks/brisk) 

AI: .2791 
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TABLE 35 

SPOKEN RESPONSES FOR Cal — SIGNAL TO NOISE RATIO: -6 d.b. 



(mats/mast , Blatz/blast, ax/ask, apse/asp, Max/mask, tax/task, raps/ 
rasp, claps/clasp, Capsian/Caspian, asking/axing) 

Al": .3136 
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■ TABLE 36 

SPOKEN RESPONSES FOR CuD AND Cotrl — SIGNAL TO NOISE RATIO: -6 d.b. 
( coats /coast , boots /boost) 

AI: .kkhh 
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TABLE 37 

WRITTEN RESPONSES FOR C®D — SIGNAL TO NOISE RATIO: -(L- d.b. 
(mats/mast, Blatz/blast, ax/ask, apse/asp. Max/mask, tax/task, rapr,/ 

rasp, claps/clasp, Capsian/Caspian, asking/axing) 

AI: .1+117 
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TABLE 38 

WRITTEN RESPOifSES FOR CTD — SIGNAL TO NOISE RATIO: -G d.b. 
(lips/lisp, bricks /brisk , blister/blitzer ) 

AI: .309^ 
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TA3LE 39 

WRITTEN RESPONSES FOR CuD AND Coi/D — SIGNAL TO NOISE RATIO: -6 d.b. 

(boots/boost, coats/coast) 

AI: .5398 
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TABLE kO 

WRITTEN RESPONSES FOR Car} — SIGNAL TO NOISE RATIO: 0 d.b. 
(lips/lisp, bricks/brisk, blister/ blitzer) 

AI: .5515 
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TABLE 1+1 

WRITTEN RESPONSES FOR CsbII — SIGNAL TO NOISE RATIO: 0 d.b. 
(mats/mast, Blatz/blast, ax/ask, apse/asp, Max/mask, tax/task, raps/ 
rasp, claps/clasp, Capsian/Caspian, asking/axing) 

AI: .U991 
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TABLE 1+2 

WRITTEN RESPONSES FOR L‘ UQ AND Z CU 3 — SIGNAL TO NOISE RATIO: 0 d.b. 

(boost/boots, coast/coats) 

AI: .1+271 
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TABLE k3 

WRITTEN RESPONSES FOR CaeU — SIGNAL TO NOISE RATIO: +12 d.b. 
(mats/mast , Blatz/blast, ax/ask, apse/asp. Max/mask, tax/task, raps/ 
rasp, claps/clasp, Capsian/Caspian, askin^/axinr:) 

AI: .8173 
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TABLE bb 

WRITTEN RESPONSES FOR C-X3 — SIGNAL TO NOISE RATIO: +12 d.b. 
(lips/lisr>, bricks/brisk, blister/blitzer ) 

AI: .8909 
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TABLE 1;5 

WRITTEN RESPONSES FOR Cu3 AND Cou3 — SIGNAL TO NOISE RATIO: +12 d.b. 

(boost/boots, coast/coats) 

AI: .9518 
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For both p and t, the second formant transition would be negative 
before CTD (as opposed to k ) . Perhaps this fact accounts for the 
confusion. 

Tables ^6 to 51 present confusions for bi -morphemic words. 
Apparently, the presence of a morpheme boundary does not deter confu- 
sions; rather, mono-morphemic and bi-morphemic words produce similar 
confusion patterns. 

Reaction time : Reaction time was compared for the two different signal - 
to-noise conditions, for words ending in different consonant clusters, 
and for correct vs. incorrect responses. 

Reaction time was significantly faster when the signal -to-noise 
ratio was +12 d.b. , than when the signal-to-noise ratio was 0 d.b. 

As can be seen in Table 52, reaction time was consistently faster 
for correct responses than for incorrect responses, although the 
difference did not always reach statistical significance. 

When the reaction time to the individual consonant clusters is 
examined, the reaction time is significantly slower to words ending in 
ps , sjd, and sk_ clusters when the signal-to-noise ratio is 0 d.b. When 
the signal-to-noise ratio is + 12 d.b., reaction time is significantly 
slower only to words ending in jos clusters.^ (Table 53). 

%'his difference may be a result of the frequency of the words . 

For example, apse is not even listed in An English Word Count 
(Wright , 1965). 

Finally, the reaction time to two-syllable words, when measured 
from the beginning of the word, is about the same as the reaction time 
to one-syllable words . When measured from the end of the word , the 



83 



TO 



TABLE 1*6 

WHITTEN RESPONSES FOR BI-MORPHEMIC WORDS— SIGNAL TO NOISE RATIO: +12 dob. 
(lips, claps, naps, bricks, coats, mats, boots) 

AI: .8391 
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TABLE 1*7 

WRITTEN RESPONSES FOR BI-MORPHEMIC WORDS— SIGNAL TO NOISE RATIO: 0 d.b, 
(lins, claps, naps, bricks, coats, mats, boots) 

AI: .1*798 
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TABLE kd 

WRITTEN RESPONSES FOR BI-MORPHEMIC WORDS— SIGNAL TO NOISE RATIO: -6 d.b. 
(lins, claps, naps, bricks, coats, mats , boots) 

AI: .1*702 
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TABLE Uy 

SPOKEN RESPONSES FOR BI-MORPHEMIC WORDS— SIGNAL TO NOISE PATIO: +12 d.b. 
(lips, claps, rar>s , bricks, coats, mats, boots) 

AI: .8116 
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TABLE 50 

SPOKEN RESPONSES FOR BI-MORPHEMIC WORDS — SIGNAL TO NOISE RATIO: 0 d.b. 
(lips, claps, raps, bricks, coats, mats, boots) 

AI: .1769 
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, TABLE bl 

SPOKEN RESPONSES FOR BI-MORPHEMIC WORDS— SIGNAL TO NOISE RATIO : -6 d.b. 
(lips, claps, raps, bricks, coats, mats, boots) 

AI: .kl9k 





nn 


ST 


PS 


SP 


KS 


GK 


TS 


50 

13 


26.9 

7 


3.8 

1 




3-8 

1 


1575" 

h 


PS 


26.9 

7 




30.8 

8 


26.9 

7 


11.6 

3 


’3.8 

1 


. KS 


10 

1 






J 


50 

5 


Uo 




85 



72 



u: 

M 

CO 

V m 
* I 

o 

C'j 

w 

rc; 

E < 
O 

w 

a; 

(X 

o 

o 



Eh 

O 

W 

K 
C\J cr; 
LT\ O 

o 

w 

< 

L-< 



PC 

O 



o 

w 

CQ 

5S 

i — i 



H 

M 

Eh 

& 

O 

M 

Eh 

O 

3 

PC 




\ 

) 



O 

ERIC 



86 



TABLE 53 

REACTION TIME, IN MSEC., TO CONSONANT CLUSTER 
(raono-syllabic words only) 
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reaction time is much shorter to two-syllable words. Apparently, subjects 
begin to respond to the two-syllable words before they hear the whole 
word, probably as soon as they hear the medial consonant cluster. 

Discussion 

The finding that has the most bearing on the perception of consonant 
clusters is that reversal errors are the most common errors. This 
finding is counter to the idea that the phoneme is the minimal perceptual 
unit; if consonant clusters are perceived "phoneme-b;/-pboneme , " then, 
when a listener hears the consonant cluster sjd, he first hears s_ and then 
he hears ru Given that he hears these in a particular order, there is 
no reason for him to reverse that order. Granted, he might on occasion 
forget the order, but there is no reason to suppose that he would be 
more likely to forget the order of the consonants than to forget one 
of the consonants ; thus , reversal errors would be no more common than 
substitution errors. However, that is clearly not the case: reversal 
errors are much more common. This finding implies that some special 
perceptual mechanisms must be postulated for the perception of consonant 
clusters . 

Broadbent and Ladefoged's suggestion appears of doubtful validity, 
not because the consonant cluster data contradict it, but for other 
reasons. As has already been pointed out by Weisser, a listener is not 
limited to an invariant time-determined chunk of input that he can 
process. This is implied by the ability of listeners to Perceive 
correctly speech that is speeded up. Broadbent and Ladefoged would 
have to claim that order errors would become more common, and involve 
more segments, as speech is speeded up, since each "timr chunk" 
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would contain more segments. But that this is not the case seems clear 
from nersonal experience with record players. 

Neisser*s suggestion, that a consonant cluster is a nerce-ntual 
unit, and Wickelgren*s suggestion that a consonant cluster is coded in 
terms of some element very much like an allophone, are both comnatible 
with the data. 

If consonant clusters are perceptual units, then clearly a ps 
cluster is most similar to a so cluster. If this is so, then, when the 
signal is degraded by the addition of noise, the items that are most 
similar to each other will be confused most; thus, reversal errors will 
be most likely. 

If a consonant cluster is coded in terms of alloDhones , then the 
allophone of s^ before £ will be slightly different, acoustically, from 
the allophone of s_ after p. This difference, however, will be the most 
subtle part of the signal; particularly, it will be smaller than the 
acoustic information differentiating consonants from each other. These 
small acoustic differences will be the first to disappear when the signal 
is degraded by noise; consequently, reversal errors will be the most 
common in a degraded signal. 

Thus, either ileisser's or Wickelgren’s suggestion will account for 
the observed result. 
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CHAPTER FOUR 



SYNTACTIC UNITS IN PERCEPTION 

Experiments involving the localization of ’’clicks" in sentences 
have been used by Bever, Fodor, and others (Fodor and Bever, 1965; 

Bever, Lackner and Kirk, 1969 ) to examine syntactic units in perception. 
The experiments are based on a phenomenon discovered by Ladefoged 
and Broadbent (i960) that subjects have great difficulty localizing 
a click in speech, when the click and speech are presented 
simultaneously . 

At first, the "click" experiments seemed to support the view 
that syntactic constituents were perceptual units: when asked to 
locate a click, subjects tended to move it towards a constituent 
boundary. A theory of perception was developed to explain the 
phenor.enon: a subject could pay attention to one thing at a time, 
he could either process speech or the click; subjects would not 
interrupt perceptual units of speech; consequently, subjects would 
tend to locate the click between perceptual units. 

However, the click-locating task, as defined in the early experiments, 
involved a complex interaction of perception and memory, since the 
subject had to remember the sentence he had just heard, remember where 
the click had occurred, and locate the click in a written version of 
the sentence. 

Reaction time is a response measure that is more directly linked 
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to perception in that the subject is not required to remember the click 
location. But when reaction time to clicks was measured, it was 
found that reaction time was not shortest to clicks located in 
constituent boundaries, as the theory would predict, and furthermore, 
reaction time did not seem to be related to the syntactic structure of 
a sentence (Abrams and Bever, 1970). 

In order to explain this development , Abrams and Bever surest a 
different model of attention in speech perception; they argue that the 
latency of the response to the click is a function of a subject’s 
over-all attention to sensory input. At the beginning of a clause, 
the subject must pay attention to the input very closely, hence his 
reaction time to clicks is fast. At the end of clauses, the subject 
can already predict much of what is to come, so he does not have to 
pay much attention, and his reaction time to clicks is slower. 

But it is also possible that constituent structure is not 
directly involved in perception, but is a result of perceptual analysis. 
It is possible that reaction time is a function of the suprasegmental 
structure of a sentence, as suggested by Dr. Lehiste (personal 
communication) . 

An experiment was designed to test a part of this hypothesis, 
namely to determine whether reaction time to clicks is affected by 
their relation to stressed elements . 

Metho d 

Stimuli : Ten sentences were selected to serve as stimuli. Each 

sentence was recorded two times in random order. Sentences were 
separated by a pause of 5 seconds. The recording was made in a 
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sound-proof booth and an Ampex 350 tape recorder, at 7 1/2 i.p.s. 

The speaker was male, with a medium pitched voice. He was 
instructed to say the sentences clearly and naturally. 

One click was placed in each sentence. There were four types 
of click location: in a stressed vowel, in an unstressed vowel, in the 
consonant preceding a stressed vowel, and in the consonant preceding 
an unstressed vowel. In addition, one click was located in a constituent 
boundary. The clicks were produced by a capacitor discharge, triggered 
by the release of a key. The click so produced was a single spike, 
with a very rapid rise and decay. The duration of each click was 
approximately 25 msec. 

The stimulus tape was made by re-recording the sentences on one 
channel of an Amp ex 35^- tape recorder and recording the click, at the 
appropriate time, on the second channel. In addition, five clicks 
were recorded on the stimulus tape before the clicks which were 
associated with sentences, to determine each subject’s reaction time 
to non-speech stimuli . 

The sentences employed, and the location of the clicks, are 
given below. For convenience, the location of clicks in both 
productions of the sentence is shown in one written version of the 
sentences. The complex sentences are taken from the study conducted 
by Abrams and Bever (1970); the simple sentences are taken from a 
study conducted by Lehiste (1971)- 



1. That the matter was dealt with fast, was a surprise 
to Harry. 



2. Since she was free that day, her friends asked her to come. 
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3- My sleep was disturbed. 

I i 

1*. By making his plan known, Jim brought out the objections 
of everybody. 

I I 

5. Speed kills. 

I ! 

6. Anv student who is bright but young, would not have seen it. 

I I 

7 . The speed was controlled. 

I ' I 

8. Sleep refreshes. 

I ! 

9- If you did call up Bill, I thank you for your trouble. 

I I 

10. After the dry summer of that year, some of the crons were 
completely lost. 

Click location was verified by inspecting the oscillograms, produced 
by two channels of an Elema-Schonander Mingograf, representing the two 
channels of the stimulus tape. 

Sub j ect s : Eleven subjects participated in the experiment. All were 

members of the Ohio State University linguistics department. 

Procedure : Each subject listened to the stimulus tape two times. The 

first time, ne was instructed to listen to the sentences and to push 
a key as quickly as he could when he heard the click. The key triggered 
a capacitor discharge which was recorded directly on one channel of 
an Elema-Schonander Mingograf. Simultaneously , the channel of the 
stimulus tape which contained the clicks was recorded on another 
channel of the Mingograf. The instrumentation is shown in the 
accompanying diagram (Fig. 9 ). Paper speed was 100 mm ner second. 
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Immediately after the first test, the subject listened to the 
tape again. This time, he was provided with a written copy of each 
sentence and asked to mark the location of each click. 

Reaction time to clicks was determined by measuring from the 
peak of the stimulus click to the onset of the response. 

Results 

The reaction time to clicks was compared for four conditions: 
when the click occurred in a stressed vowel, when it occurred in an 
unstressed vowel, when it occurred in a consonant preceding a stressed 
vowel, and when it occurred in a consonant preceding an unstressed 
/ vowel. The results are presented in Table 5*+ and in Fig. 10 to 12. 

Fig. 10 shows the reaction time to a click embedded in a consonant 
preceding a stressed vowel, and in a consonant preceding an unstressed 
vowel. For all but one subject, the reaction time is faster to the 
click preceding an unstressed vowel. Fig. 11 shows reaction time to 
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clicks embedded in stressed vowels and to clicks embedded in 
unstressed vowels. For six subjects, the reaction time is faster 
to a click in an unstressed vowel; for the other subjects, the reaction 
time is essentially the same. 




123^56789 1° 11 

SUBJECT 

Fig. 10. Reaction time to clicks in consonants preceding 
stressed vowels and to clicks in consonants preceding 
unstressed vowels. 
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123^56789 10 11 

SUBJECT 

Fig. 11. Reaction time to clicks in stressed vowels and 
in unstressed vowels. 

Although the differences are not always statistically significant, 
the tendency is clear: reaction time to clicks is affected by their 
location in relation to stressed elements . Reaction time to a click 
is longest when the click is in the vicinity of a stressed element, 
either in a stressed vowel or in a consonant preceding a stressed 
vowel. Reaction time is shorter when the click is in the vicinity of 
an unstressed element, either in an unstressed vowel or in a consonant 
preceding an unstressed vowel. 

The reaction time to clicks located in constituent boundaries is 
quite variable. For some subjects, it is very short in this condition, 
approaching the reaction time to non-speech stimuli. For other subjects. 
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it is quite long, longer than the reaction time to clicks in any 
other condition. 

Reaction time to non-speech clicks is short in all cases , 
implying that reacting to a click in a speech context is more complex 
than simply reacting to a click. These results are presented In Fig. 12 
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Fig. 12. Simple reaction time to click, and reaction time 
to click in a constituent boundary. 



There is considerable variation in reaction time between subjects: 
subject 6, particularly, has quite slow reaction time to all conditions. 
Nevertheless, for each subject, the reaction times are in the same 
relationships, depending on the location of the click. 

Click localization: The results of the click localization test are, in 
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general, in agreement with previous studies. Click localization tends 
to be accurate when the click occurs in a constituent boundary. This 
is shown in Fig. 13. The asterisk indicates the location of the click 
the bar graph indicates the subjects' localization of the click. 
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call up Bill,* I thank you ... 



Fig. 13. Click localization when the click occurs in 
a constituent boundary. 

There is also a tendency for subjects to move clicks towards deep 
structure constituent boundaries and to locate clicks between words 



These results are shown in Fig. 14, for some typical sentences 
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...dealt with fast, was a surprise... 
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Fig. lJ+ — continued 
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However, the location of stress also affects click localization. 
Clicks in stressed vowels are localized much more accurately than 
clicks in unstressed vowels. This can be clearly seen by examining 
Fig. 15. The click in the stressed vowel of sleep is localized 
correctly more often than the click in the unstressed vowel of was . 
Furthermore, subjects do not miss the correct location by as much for 
the click in the stressed vowel as for the click in the unstressed 
vowel . 




Fig. 15. Click localization in stressed and unstressed 
vowels . 

Accuracy of click localization is summarized in Table 55* 



TABLE 55 

CLICK LOCALIZATION: PER CENT CORRECT 
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D iscussion 

The click localization data seem to imply that click localization 
is controlled by two parameters, constituent structure and the presence 
of stress. Click localization errors tend to lie in the direction 
predicted by theory, but clicks are less likely to be moved from a 
stressed vowel than from an unstressed vowel. That localization of 
clicks in consonants is also inaccurate may simply be a result of 
response bias: subjects may be less inclined to locate a click in a 
consonant. However, it may also result from the fact that the duration 
of consonants is short in relation to the duration of clicks. 

The observed differences in reaction time implv that suprasegmental 
structure has some function in defining the units of speech perception. 
Since reaction time is not directly affected by constituent structure, 
it can be inferred that constituent structure does not define the units 
of perceptual input. Instead, the data support the hypothesis that 
units of perceptual input are defined by suprasegmental structure, 
i.e. stress and intonation. 

There is one objection that might be raised to this conclusion. 
Stressed vowels occur in words that have semantic content whereas 
unstressed vowels occur in words that have less semantic content. In 
other words, words with stressed vowels are not predictable from 
context while words with unstressed vowels are much more readily 
predictable. The experiment, as designed, does not explicitly 
differentiate between this effect and the presence of stress. However, 
the objection is not crucial because the effect on reaction time is 
quite as pronounced when the click is in the consonant preceding the 
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vowel. It is difficult to see why a subject should react differently 
to these clicks if only the predictability of the word were the 
issue. .Further testing is necessary, however, to rule out the 
"predictability hypothesis” completely. 
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CHAPTER FIVE 



CONCLUSION 

The results of the studies reported above are interesting in 
themselves , but they are also interesting in what they imply about 
the processes underlying speech perception. To summarize briefly, 
the results are the following: 

1. Subjects are aware of sub-phonemic phonetic differences, at 
least under appropriate conditions , but can not make linguistic use 
of them. 

2. Perception of at least some phonological segments involves 
special perceptual mechanisms, rather than proceeding segment-by- 
segment . 

3. Syntactic units in perception may be defined by suprasegmental 
structure . 



The Need for Perceptual Units 

Before the implications of these findings for specific theories 
of speech perception will be discussed, it seems reasonable to re- 
examine the assumption of this study, namely that there are units in 
speech -nerception. 

As Experiment I shows, subjects can become aware of very fine 
phonetic differences if they attend to a particular utterance with 
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great care. It is likely that subjects could even be taught to 
identify most of the words used in Experiment I properly, provided 
that the subjects got proper feedback, and provided that the 
stimuli were properly selected so that the distinctive cues were 
invariably present in each production. In this sense, there is no 
clear lower limit below which speech stimuli are perceived as "the 
same," and^one might suppose, no lower limit for a phonological 
perceptual unit either. 

However, just because a listener can utilize fine phonetic detail 
when the conditions of a test force him to do so, does not imply that 
listeners inevitably notice or pay attention to such information. 

Rather, listeners are probably content with less detailed phonetic 
representations. To draw an analogy with visual perception, we do 
not examine leaves when we are looking at a forest . In visual 
perception, we can examine, in great detail, the shape and color of 
particular objects. But ordinarily, we do not do this; we are conte t 
to recognize objects and to behave appropriately to them — we sit in 
chairs, pat dogs, speak to our friends. Similarly, in the ordinary 
course of language use, we deal with something other than with fine 
phonetic differences. Therefore, there must be postulated some larger 
unit — or higher level — at which the phonological structure of an 
utterance is represented, independently of the fine phonetic details 
of the utterance . 

This level, however, must be independent of syntactic or contextual 
information for the reason that new words , such as proper names and 
technical terms , do not present undue difficulty to us ; we simply 
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hear the word, and we remember it. 

These two considerations imply a lower and an upper boundary 
for the perception and coding of phonological information: the units 
involved in this process can not be equivalent to the phonetic 
representation of the utterance and the units can net be dependent 
on syntactic information. 

Similarly, there must be some unit, or preferred units, in 
arriving at a syntactic analysis of a sentence. It is not oossible 
for listeners to store a whole sentence in memory, simply because, 
unless the sentence were recoded in some way, it would very easily 
exceed the short-term memory capacity of a listener. It seems 
reasonable to suppose that the recoding operation can not process 
the sentence continuously as it is heard, but that the sentence must 
be broken up into some sort of units — perceptual segmentation units — 
for the recoding process to operate upon. The results of the recoding 
process certainly embody syntactic structure in some way. 

It has been supposed previously that the perceptual segmentation 
units were syntactic as well. But the results of Experiment III can 
not be reconciled with the idea that segmentation units are syntactic. 
If they were, then reaction time to clicks and click localization 
should give the same results. Since this is not the case, the 
implication is that, at some level, sentences are processed in terms 
of non-syntactic units . The results of Experiment III imply that 
these units are defined by the phonological structure of an utterance 
and that these units function at the initial segmentation of the 
sentence. These initially segmented units are then recoded, probably 
by assigning them a particular syntactic function. 
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Thus, there is a need for at least two types of units in speech 
perception: units of phonological processing and units defining a 
part cf a sentence for further syntactic analysis — perceptual 
segmentation units. 

I mplications for Perception Models 
Hot all of the theories of speech perception discussed in Chapter 
I make specific predictions about units of speech perception, but 
several do, namely the motor theory, analysis-by-synthesis , "filtering” 
theories, Osgood’s perception model, and the perceptual strategies 
model. The experimental findings, reported above, conflict with 
some predictions made by these models, although, of course, the models 
may be revised slightly to cope with them. 

First, the motor theory of speech perception, in that it asserts 
categorial perception of phonemes, conflicts with a listener’s ability 
to become aware of sub-phonemic phonetic differences. If the perception 
of phonemes were indeed categorial, then listeners could not become 
aware of any sub-phonemic information whatever. Yet this is not the 
case; listeners are aware of sub-phonemic detail and use both vowel 
length and consonant quality in developing a strategy for making 
identification judgments. Second, that the motor theory postulates a 
phoneme-like unit as the basic unit of perception, it conflicts with 
the implications of Experiment II — that listeners apparently employ 
special perceptual mechanisms to process some consonant clusters, 
rather phan perceiving the clusters ”phoneme-by -phoneme . " 

This second objection also applies to analysis -by -synthes is 
models. These models assume that phonology is perceived in terms of 
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discrete segments. This assumption can not account for the findinr 
of Experiment- II — that reversal of the order of segments is the 
most common perceptual error. 

In a fundamental way, the motor theory and analysis -by-synthes is 
are quite similar: both postulate that the listener generates a 
possible phonetic output and matches this output against the incoming 
message. The theories differ only in the nature of the internal 
mechanisms that they postulate. The experiments reported in this 
work do not have any implications for this basic postulate. However, 
it must be added here that there is no evidence that such internal 
mechanisms are strictly necessary. The "synthesis" theories have 
been postulated, apparently, because there are no invariants given 
immediately in the acoustic speech signal. Instead, the relationship 
between the acoustic signal and the perceptual result is quite complex. 

Still, this difficult;.” is not unique to speech perception. In 
the study of visual perception, it has been commonly observed that the 
retinal image — which we may consider to be analogous to the acoustic 
input — is much more varied than the perception of objects. The 
retinal image changes radically as we view an object from different 
angles and from different distances, yet the percept is of an 
unchanging, stable object. The relationship between the retinal image 
and the percept is no less complex than the relationship between the 
acoustic signal and perceived speech, yet we do not posit a "motor 
theory of visual perception" for this reason. 

These comments are added only to point out that a complex 
relationship is not sufficient grounds for positing intermediate 
devices of an unrelated type: theoretical mechanisms have to have 
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independent empirical justification. 

The filtering theories discussed in Chapter I are of two types: 
theories that assume a phoneme-like unit, and Wickelgren’s context- 
sensitive coding which assumes that the perceptual unit is similar to 
the traditional allophone. There are two objections to the phoneme- 
like unit : first , the well-known lack of invariance between phonemes 
and the acoustic signal and, second, the fact that obstruent clusters 
are apparently not perceived "phoneme-by-phoneme . ” 

Wickelgren's theory tries to overcome the first difficulty by 
assuming smaller, hence presumably invariant, units, but it does so at 
the cost of proliferating the number of different units that must be 
assumed. Furthermore, it is still to be determined if there are invariant 
acoustic differences that can be used to determine the order of segments. 
Context-sensitive coding can, however, account for the perception of 
obstruent clusters. One further advantage of both types of filterin' 1 : 
theories must be mentioned. Neithe'* version of the theories is limited 
to a strict sequence of segments in the input, if the "filters" can be 
assumed to be working in parallel. Rather, the listener can be 
presumed to process a rather large segment of speech at one time. 

Osgood argues that the word is the basic perceptual unit. However, 
there are several difficulties with this position. First, as has 
already been pointed out , there must be some perceptual units which 
enable a listener to code a new word. It would be unparsimonious 
to suppose that these mechanisms are used only to code new words. 

Second, listeners can become aware of very subtle phonetic differences, 
a finding which is counter to the notion that a word is the only 
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perceptual unit. But it seems likely that words function as units 
at some level of speech perception. 

The perception of syntactic structure has been touched on only 
briefly in this study. The perceptual strategies suggested by Sever, 
and others, are not in dispute here; a fair amount of evidence has 
been offered to substantiate them, and no finding presented in this 
work conflicts with them. What has been questioned is the assumption 
that syntactic units provide the initial segmentation of a sentence. 

As has already been pointed out, this can not be the case because 
reaction time and click localization do not give the same results. 
Rather, the most likely hypothesis is that initial segmentation is 
accomplished by using the suprasegmental structure of an utterance. 
After this initial segmentation, perceptual strategies, as defined by 
Bever, may well apply to enable the listener to arrive at a syntactic 
analysis of the utterance. 

The remarkable fact about speech perception is that it seems to 
be an easy and effortless process. Yet the mechanisms underlying this 
process are only beginning to be studied. Perhaps the best that could 
be said is that we are beginning to appreciate how complicated and 
mysterious the process of speech perception really is. Any adequate 
explanation will undoubtedly require a much more thorough understanding 
of human cognitive abilities on the one hand, and of the nature of 
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language on the other . 
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Abstract 

This paper is concerned with the effect of morphological ad 
syntactic boundaries on the temporal structure of spoken utterances. 

Two speakers produced twenty tokens each of four sets of words consisting 
of a monosyllabic base form, disyllabic and trisyllabic words derived 
from the base by the addition of suffixes , and three short sentences 
in which the base form was followed by a syntactic boundary, this in 
turn followed by a stressed syllable, one unstressed syllable, and two 
unstressed syllables. The sentences thus reproduced the syllabic 
sequences of the derived words. The duration of words and segments was 
measured from oscillograms. The manifestation of morphological and 
syntactic boundaries is discussed, and some implications of the 
findings relative to the temporal programming of spoken utterances are 
considered. 



0. Introduction 

This paper is concerned with the effect of morphological and 
syntactic boundaries on the temporal structure of spoken utterances. 

The investigation was prompted by the observation made in the course of 
a previous study, that the duration of a word may be considerably 

reduced, if a derivational suffix is added to the word constituting the 
base. In this earlier study, the •words stead , skid and skit were compared 
with stead y, skiddy and ski tty . It might have been expected that the 
latter set would be longer than the former by the average duration of 
the derivational suffix. It turned out instead that the duration of the 
base part of the derived word was considerably shortened, so that even 
with the addition of a fairly long -y, the overall duration of the derived 
words was not much different from that of the base words. 

In the current study, four sets of words were examined, built 
around the base forms stick , sleep , shade , and speed . Each of the 
words occurred by itself and in eight additional utterance types. 

Five derivational suffixes were used, three of them monosyllabic 
and two disyllabic . The words were further placed in short sentences 
in which they were followed by a major syntactic boundary — the 
boundary between the noun phrase functioning as subject and the 
verb phrase functioning as predicate. The verb phrase itself either 
consisted of a stressed monosyllable (in three cases) or started 
with a stressed syllable (in one case); or it started with one or 
two unstressed syllables . The sentences thus reproduced the syllabic 
sequences of the derived words . It was the purpose of the study to 
explore whether there are any differences in the durations of the 
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base, depending on whether it is followed by a morpheme boundary 
within the same word, or by a major syntactic boundary coinciding 
with the word boundary. 



I . Method 



The test material, presented in Table 1, was recorded by two 
speakers, R.G. (male) and L.s* (female), both graduate students at 
The Ohio State University. The recordings were made under standard 
conditions in an anechoic chamber using reliable recording equipment. 
The utterances were produced in two ways, to test the comparability 
of different contexts and to vary the fairly artificial recording 
technique of repeating the same word a large number of times . One 
of the ways was indeed the repetition technique : each word was uttered 
ten times tinder a subjectively established Constant* rate. Then each 
set, consisting of base word, derived words, and three short sentences, 
was read ten times in succession. Each speaker thus produced 20 
tokens of each word, for a total of 720 utterances by each speaker. 

The durations of words and segments were measured from 
oscillograms, produced by processing the recorded tapes through a 
FrjSkjaer- Jensen Trans-Pitch Meter and Intensity Meter, connected to a 
four-channel Eiema-Schonander Mingograph. The material was analyzed 
statistically, using the IBM 360 Model 75 computer available at The 
Ohio State University Instruction and Research Computer Center. 



II . Comparability of the Two Sets of Data 



For both sets of data, the following computations were carried 
through: the mean duration of each segment; the mean duration of each 
word; the mean duration of the base component of the derived word 
(e.g., stick in sticky ) ; the variances and standard deviations of 
each segment and word. The differences between the corresponding 
means for each segment and word were tested for significance according 
to the formula: 



( 1 ) 




Z = 




The difference in variability between the two sets was tested by two 
(related) measures: 3 



(2) 



H = 



MAX 



MIN 



C = 



a 



2 

MAX 



MIN 



+ a 4 



MAX 



For the given number of tokens, the critical valuer (at the 95 % 
confidence level) were 1.960 for Z, 1.030 for H, and 0.801 for C. 
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It was found that the differences "between the two sets of 
utterances for each speaker were random, and that there was minimal 
overlap "between the two speakers in cases of statistically significant 
differences. Out of 196 pairwise comparisons of and Xg, speaker 
R.G. had 6 5 significant differences, speaker L.S. 08 significant 
differences; the same segments were involved in 35 instances, but 
these segments constituted no natural set: there was no discernible 
system. A separate check of syllable nuclei showed 11 instances for 
R.G. and 2 6 instances for L.S. in which the means differed significantly, 
i.e. Z was higher than the critical value. The same syllable nucleus 
was involved in 9 instances . As regards the differences in variability 
between the two sets, speaker R.G. had 15 (cut of 196) cases in which 
the difference in variances between the two sets was significant; 
speaker L.S. had 38 instances, of which 9 involved the same segment 
for both speakers. As far as syllable nuclei were concerned, speaker 
R.G. had 2 instances of significant differences, L.S. with an overlap 
of 2 . 

Combining the two sets would tend to increase the extreme ranges 
for each combined set of utterances and thus increase the variability; 
but since the difference in variability between the two sets was 
negligible, it was decided to combine the two sets in future 
calculations. The resultant increase in variability was in effect 
quite small. It is hoped that the method of producing the test 
utterances in the two different ways described above will have 
reduced the artificiality of the situation in which long lists of 
words are produced out of context , and that the results are better 
applicable to a more natural speech situation. 

III. Effect of Morpheme Boundaries 

In order to study the effect of morpheme boundaries (and word 
boundaries) on the duration of the base to which derivative suffixes 
were added, B/D ratios were computed. This term refers to the ratio 
of the durations of the base word (produced by itself) and the sum 
of the durations of the same segments occurring in the derived word 
(e.g. , the mean duration of stick would be divided by the mean 
duration of the stick part of the word sticky ) . These ratios were 
calculated for all test words, and, separately, for the syllable 
nuclei in all test words. The differences between the means were 
highly significant in all instances; Z-values , which were always 
higher than the critical value, will not be included in the tables. 

The results are presented in Tables II-V and graphically in Figures 
1 - h. The tables are self-explanatory; a few words of explanation 

may be needed for the figures . 

On each figure, the derived word types and sentence types are 
given on the vertical axis. The horizontal axis is calibrated to 
show increasing B/D ratios. Points representing B/D ratios for 
words are connected with solid lines ; points representing B/D ratios 
for syllable nuclei are connected with dashed lines . The curves 
start in the left hand top corner at the B/D value 1: Base/Base 
yields a ratio of 1. Increasing ratios show decrease in the duration 
of the base component of the derived word resp. .its syllable nucleus . 
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Several observations may be made regarding the figures. In no 
case was the duration of the same sec of segments greater in a 
derived word than in the base form. The suffixes -jr, - er and -ing 
seem to be equivalent with respect to their effect on the duration 
of the stem. It appears that the number of segments in the suffix 
has no systematic effect on the duration of the stem. This observation 
is confirmed by looking at the behavior of stem forms before the 
suffix - ily . This suffix was in fact pronounced with a syllabic 
/l/ by both speakers in all productions; thus the stems of words like 
sticking and stickily were followed by two segments each, but the 
- ing suffix was monosyllabic and the - ily suffix was disyllabic. 

In all cases, the disyllabic -ily suffix produced greater reduction 
in the duration of the stem than the monosyllabic suffix -ing , 
although both consisted of the same number of segments . 

The suffix -iness constitutes a special case. In each instance, 
the B/D ratio was greatest under this condition. This is a disyllabic 
suffix, as is - ily ; however, its rhythmic structure is considerably 
different. It seems possible that in the case of the -iness suffix 
we are dealing with two cycles of derivation: that, for example, 
sticky is derived from stick in the first cycle, and stickiness 
from sticky in the second cycle. If this is so, then the ratios of 
stick /sticky and sticky/stickiness (involving the base forms stick 
and sticky respectively) should be approximately equal. Some 
support for this assumption may indeed be found in Table VI, which 
presents the pertinent ratios. 

A comparison of the curves for words with the curves for 
syllable nuclei indicates that the reduction in the duration of a 
stem in the derived form is achieved more at the expense of vowels 
than at the expense of consonants . The nature of the vowel and the 
postvocalic consonant seem to play an equally important role . 
Intrinsically long syllable nuclei (like those in slee p, speed , and 
shade ) are more compressible than intrinsically short syllable nuclei 
(as in stick ) . But /i/ in sleep , when followed by a voiceless plosive, 
is much less compressible than /i/ in speed and /e x / in shade . 
Tendencies for being reduced under a certain condition become 
accentuated when one looks at the most compressible segment: for both 
speakers, the greatest effects of the various positions are manifested 
in the syllable nuclei of speed and shade . 

IV. Effect of Syntactic Boundaries 

One of the hypotheses tested in this experiment was the hypothesis 
that syntactic boundaries would have temporal effects that are clearly 
distinct from those of morpheme boundaries. However, the results of 
this study show that as far as the temporal structure of utterances 
is concerned, effects of morpheme boundaries and effects of syntactic 
boundaries cannot be separated from each other. Furthermore , it is 
not certain that the boundaries as such have any effect at all, since 
the temporal, structure of the utterances seems to depend most of all 
on their syllabic structure, regardless of the nature of the 
boundaries involved. 

In sentences like Speed kills , we find durations of the test 
word that are very similar to those of disyllabic bimorphemic words; 



O 
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sentences like The speed increased resemble most words like speediness , 
with an unstressed short syllable followed by a relatively long 
syllable. The addition of another unstressed syllable may have a 
further reducing effect, but the data are not consistent at this 
point. The major result here is the absence of any clear differences 
between the effects of morpheme boundaries and syntactic boundaries , 
and the likelihood that the durational structure is conditioned by 
the number of syllables rather than either by the number of segments 
or by the presence of boundaries. 

V. Generality of the Findings 

One of the ways to test the results would be to form predictions 
on the basis of these data and then compare the predictions with 
further observations. I intend to record other sets of words by 
the same speakers as well as the same sets of words by different 
speakers, and calculate the goodness of fit between predicted and 
observed B/D ratios. The basis for predictions might be Table VII, 
which combines words that seem to behave in a similar fashion for 
the two speakers . 



VI. Discussion 



The results of this study confirm earlier studies in some 
respects, but differ from them in certain important aspects. 

Bolinger^ stated that long syllables tend to acquire extra 
length if followed by another long syllable (long syllables being 
those that contain a full vowel); if followed by a short syllable, 
long syllables cannot acquire that extra length and therefore appear 
shorter. This process tends to ignore morpheme and word boundaries, 
and may take place across a syntactic boundary. 

The present study confirms Bolinger's notion that temporal 
readjustment processes tend to ignore morpheme and word boundaries. 
The shortening of a long syllable before a short syllable is likewise 
confirmed in all the data. However, in sentences of the type Speed 
kills , the word speed (and words in analogous sentences) certainly 
did not acquire any extra length, at least in comparison to isolated 
productions of the same word. 

Gaitenby^ found a common ratio of segment- co-utterance length 
for all di alects of American English sampled in her study. When 
segment durations were converted to percentages of total utterance 
time, it was found that 90$ of all the segments varied less than 5-3$ 
for any speaker. The longer the utterance in terms of number of 
segments, the shorter the absolute duration of any given segment, 
until an approximate minimum duration was reached beyond which 
segments could not be compressed any further. She noted also that 
words immediately preceding a pause tended to expand in utterances 
of all lengths. According to Gaitenby, it would thus be the word 
closest to the pause that would acquire extra length, while in longer 
utterances, the preceding parts of the sentence would be produced at 
a faster rate. This seems to be borne out by the findings: in the 
three sentences, the base word became successively shorter, the 
farther it was removed from the end of the sentence. A difference 
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between Gaitenby's results and those obtained in this study is the 
observation that utterance length should be determined with reference 
to number and type of syllables rather than with reference to the 
number of segments . 

Chomsky and Halle® have postulated a hierarchy of boundaries 
which delimit linguistic units that serve as domains of application 
of different kinds of phonological rules . Although the authors are 
careful to state that phonetic effects need not be associated with 
(word) boundaries, the postulation of a hierarchy of boundaries 
naturally prompts a phonetician to look for possibly hierarchical 
differences in the manifestations of these boundaries. I had previously 
formulated the hypothesis that phonological units are definable in 
terms of suprasegmental patterns, while their boundaries are mainly 
manifested in terms of modifications of segments. 7 Few, if any, 
indications of word boundaries emerged from the present study. There 
were a small number of instances in which the duration of the segment 
preceding a word boundary was greater than the duration of the same 
segment preceding a morpheme boundary. As far as the overall 
temporal organization of the utterances is concerned, no evidence for 
a hierarchical organization of boundaries was found as a result of 
this study. The temporal organization of spoken language seems to 
take place in terms of speech production units which are fairly 
independent of the morphological or syntactic structure of the 
utterances . 
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Table I. Test materials used in the study. 

The symbol - is used to indicate the boundary between stem and 
derivative suffix. # symbolizes word boundary; 1 and 1 refer 
to stressed and unstressed syllables. 





BASK 


stick 


sleep 


shade 


speed 




-Y 


sticky 


sleepy 


shady 


speedy 


- 


-KR 


sticker 


sleepei 


shader 


speeder 


- 


- IIJG 


sticking 


sleeping 


shading 


speeding 


- 


-ILY 


stickily 


sleepily 


shadily 


speedily 




-INESS 


stickiness 


sleepiness 


shadiness 


speediness 




ft 1 


the stick fell 


sleep heals 


the shade 


speed kills 










lingered 






# H 


the stick is 


sleep 


the shade 


the sp--ed 


- 




broken 


refreshes 


increased 


increased 


- 


) 

> 


the stick war 


my sleep was 


the shade was 


the speed was 


- 




liscarded 


disturbed 


refreshing 


controlled 
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Table II. Mean durations (in milliseconds), standard deviations and 
B/D ratios for two sets of words and corresponding syllable nuclei 
produced by speaker R.G. 



Utterance 


Duration of 
base 


a 


B/D 

ratio 


Duration of 
Syl . nucleus 


a 


B/D 

ratio 


% stick 
sticky 


1+01.55 

312.80 


29 . 1+5 

23.68 


1.281+ 


130.70 

93.1+5 


6 . 91 + 

6.53 


1.399 


sticker 


302.50 


17 . 1+9 


1.327 


89.1+5 


8.85 


1 . 1+61 


sticking 


295.1+5 


16.92 


1.359 


88.80 


7.28 


1.1+72 


stickily 


291.10 


17.90 


1.379 


81+ .15 


6. 75 


1.553 


stickiness 


265.75 


15.79 


1.511 


78.90 


5.63 


1.657 


The stick fell 


27 !+. 85 


ll+.io 


1 . 1+61 


87.90 


7.02 


1.1+87 


The stick is 
broken 


21+8.20 


12.65 


I .618 


81.65 


7.57 


1.601 


The stick was 
discarded 


21+5.10 


13 . 1+9 


1.638 


77.90 


5.81 


I .678 


sleep 


1+09-30 


18.96 




123-55 


11+.55 




sleepy 


336.80 


19.70 


1.217 


81+ .15 


7.97 


1.1+68 


sleeper 


31+1.25 


19.83 


1.201 


83.10 


9.21 


1.1+87 


sleeping 


330.35 


18.12 


1.21+1 


81.50 


10.11 


1.516 


sleepily 


313.35 


13.99 


1.308 


69.60 


8.58 


1.775 


sleepiness 


287.05 


13.81 


1.1+28 


62.05 


6. 79 


1.991 


sleep heals 


305-95 


16.33 


1.339 


75-95 


8.1+1 


1.627 


sleep refreshes 


299.60 


19.90 


1.368 


61.85 


1+.67 


1.998 


My sleep was 
disturbed 


307.1+5 


17.1+1+ 


1.333 

! — J 


59.65 

P 


9.65 

! ■ ■ H 


2.071 
> — . — 




136 



123 



Table III. Mean durations (in milliseconds), standard deviations and 
B/D ratios for two sets of words and corresponding syllable nuclei 
produced by speaker L.S. 



Utterance 


Duration of 
base 


a 


B/D 

ratio 


Duration of 
Syl . nucleus 


a 


B/D 

ratio 


stick 

sticky 


U31.80 

3^6.00 


1*3.33 
31*. 1*1* 


1.21*8 


168.90 

115.50 


23.25 

15.83 


1.1*62 


sticker 


331.95 


25.88 


1.301 


109.65 


ll*.75 


1.51*0 


sticking 


31*8.30 


30.56 


1.21*0 


109-20 


17.36 


1.51*7 


stickily 


303.10 


17.93 


1.1*25 


77.05 


6.89 


2.192 


stickiness 


271.60 


20.78 


1.590 


76.50 


6. 92 


2.208 


The stick fell 


311.15 


22. 71* 


1.388 


91.35 


11.17 


1.81*9 


The stick is 
broken 


283.90 


19.1*6 


1.521 


88.85 


10.83 


1.901 


The stick was 
discarded 


268.15 


28.1*0 


1.610 


80.75 


8.1*2 


2.092 


sleep 

sleepy 


1*1*2. 1*5 
363.1*0 


39.62 

19.61* 


1.218 


18C. 30 
131.1*5 


16.85 

9.2l* 


1.372 


sleeper 


363.35 


22.87 


1.218 


127.25 


8.90 


1.1*17 


sleeping 


37 1*. 1*5 


18.26 


1.182 


132.1*5 


10.87 


1.361 


sleepily 


31*2.60 


16.72 


1.291 


111* . 50 


8.72 


1. 575 


sleepiness 


307.70 


16.39 


1.1*38 


96.55 


8.1*5 


1.867 


Sleep heals 


325.00 


25.33 


1.361 


113-55 


ll* .77 


1.588 


Sleep refreshes 


282.75 


18.96 


1.565 


93.55 


9.71* 


1.927 


My sleep was 
disturbed 


3ll*-90 


26.82 


1.1*05 




99 • 1*0 

k 


19.27 


1.8ll* 

Jl 
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Table IV. Mean durations (in milliseconds), standard deviations and 
B/D ratios for tvo sets of words and corresponding syllable nuclei 
produced by speaker R.G. 



Utterance 


Duration of 
base 


a 


B/D 

ratio 


1 

1 Duration cf 
Syl . nucleus 


a 


B/D 

ratio 


speed 
• speedy 


511.50 

359.75 


34.95 

15.09 


1.422 


266.00 

150.50 


28.17 

10.25 


1.767 


speeder 


344.75 


l6. 42 


1.484 


141.50 


n.oi 


1.880 


speeding 


342.50 


13.13 


1.493 


136.00 


9.81 


1.956 


speedily 


322.50 


18.03 


1.586 


120.00 


8.27 


2.217 


speediness 


313.25 


16.57 


1.633 


115.50 


7.76 


2.303 


Speed kills 


344.00 


17.06 


1.487 


125.50 


8.87 


2.120 


The speed 














increased 


301.25 


15.12 


1.698 


110.00 


7.61 


2.4l8 


The speed was 
controlled 


293.50 


20.53 


1.743 


104.00 


8.97 


2.558 


shade 

shady 


454.10 

327.20 


28.88 

20.08 


1.388 


266.15 

181.85 


18.61 

14.79 


1.464 


shader 


324.20 


18.81 


i.4oi 


172.40 


9.54 


1.544 


shading 


306.95 


23.39 


1.479 


158.00 


11.24 


1.684 


shadily 


276.70 


10.20 


1.64l 


132.05 


8.74 


2.016 


shadiness 


265.20 


17.60 


1.712 


125-35 


9.83 


2.123 


The shade 














lingered 


324.80 


18.49 


1.398 


146.95 


16.23 


1.811 


The shade 














increased 


298.60 


18.44 


1.521 


130.15 


12.93 


2.045 


The shade was 














refreshing 


307.60 


26.05 


1.476 


131.50 


18.61 


2.024 


. 


1 I 


1 1 


1 




- j 


> 
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Table V. Mean durations (in milliseconds), standard deviations and 
B/D ratios for two sets of words and corresponding syllable nuclei 
produced by speaker L.S. 



Utterance 


Duration of 
base 


a 


B/D 

ratio 


Duration of 
Syl. nucleus 


a 


B/D 

ratio 


speed 

speedy 


51 b . 25 
39^-85 


30.00 

23.89 


1.1+51+ 


297.85 

163.30 


16.25 

11.69 


1.821+ 


speeder 


1+03.85 


18.1+1+ 


1.1+22 


171.75 


13.52 


1.731+ 


speeding 


396.10 


21+.51+ 


1.1+50 


158.75 


12.86 


1.876 


speedily 


351+.50 


29.75 


1.620 


126.25 


16.98 


2.359 


speediness 


322.70 


23.1+1 


1.780 


101+.1+0 


6.66 


2.853 


Speed kills 


1+16.55 


27.28 


1.379 


163.05 


19.07 


1.827 


The speed 














increased 


31+2.85 


20.97 


1.675 


127.30 


11.68 


2.31+0 


The speed was 














controlled 


305.50 


22.00 


1.880 


96.65 


7-92 


3.082 


shade 

shady 


I+5I+.65 

321.65 


20.72 


1.1+13 


267.70 

165.25 


22.88 

11.26 


1.620 


shader 


326.75 


26.61 


1.391 


160.50 


1I+.16 


1 . 668 


shading 


312.95 


22.09 


1.1+53 


159-30 


19-72 


1.680 


shadily 


291+.15 


26.1+1 


1.5 1+6 


139-95 


21.89 


1.913 


shadiness 


261.65 


25.37 


1.738 


112.55 


11.65 


2.378 


The shade 














lingered 


331.95 


36.21+ 


1.370 


151+.1+0 


23.93 


1.731+ 


The shade 














increased 


282.20 


22.03 


1.611 


135-75 


16.90 


1.972 


The shade was 














refreshing 


273.1+0 


23.97 


1.633 


111+.25 


15-97 


2.31+3 


1 J 


| j 


b 






■ 


fc ■ - 
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Table VI. Mean durations (in milliseconds), standard deviations, 
and B/D ratios for words derived with -jr and -ness , in which the 
- ness words are derived by a two-cycle operation from the base. 



Speaker R.G. 


Speaker L.S. 


Utterance 


Duration 


a 


B/D 


Utterance 


Duration 


a 


B/D 




of base 




ratio 




of base 




ratio 


stick. 

•v 


1*01.55 


29.1*5 




stick 


1*31.80 


1*3.33 




stick-y 


312.80 


23.68 


1.281* 


stick-y 


31*6.00 


31*. 41+ 


1.21*8 


sticky 


513.25 


37.52 




sticky 


557.1*5 


36.59 




sticky-ness 


376.75 


17.66 


1.362 


sticky-ness 


388.50 


21*. 31+ 


1.1*35 


sleep 


109-70 


18.96 




sleep 


1*1*2. 1*5 


39.62 




sleep-y 


336.70 


19.70 


1.217 


sleep-y 


363.1*0 


19.61* 


1.218 


sleepy 


517.55 


26.58 




sleepy 


5l*l*. 20 


30.99 




sleepy-ness 


369.65 


ll*.15 


1.1*00 


sleepy-ness 


392.20 


18.1*6 


1.388 


speed 


511.50 


31*. 95 




speed 


57l*.25 


30.00 




speed-y 


359.75 


15.09 


1.1*22 


speedy 


39l* - 85 


23.89 


1.1*51* 


speedy 


529.95 


26.23 




speedy 


597-1+0 


16.9I* 




speedy-ness 


396.35 


16.60 


1.337 


speedy-ness 


1*10.55 


31.19 


1.1*55 


shade 


1*5*+. 10 


28.88 




shade 


l*5l*. 65 


35.81* 




shade-y 


327.20 


20.08 


1.388 


shade-y 


321.65 


20.72 


1.1*13 


shady 


1*77.90 


25.81 




shady 


1*90.60 


2l* .1*3 




shady-ness 


31*6.30 


16.1*6 


1.380 

1 


shady-ness 

■ 


329.70 


23.87 


1.1*88 

* 



Table VII. Average B/D ratios (speakers R.G. and L.S. combined) 





stick 


, sleep 


shade 


, speed 




WORD 


SN 


WORD 


SN 


Base 


1.00 


1.00 


1.00 


1.00 


-Y 


1.21*2 


1.1*25 


1.1*20 


1.669 


-ER 


1.262 


1.1*76 


1.1*25 


1.706 


-ING 


1.256 


1.1*71* 


1.1*69 


1.799 


-ILY 


1.351 


1.77*+ 


1.599 


2.126 


-INESS 


1.1*92 


1.931 


1.716 


2.1*15 


ft ' 


1.388 


1.638 


1.1*09 


1.873 


ft - " 


1.518 


1.857 


1.626 


2.191* 


# ~ ~ ' 


1.1*97 


1.911+ 


1.683 


2.502 
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Fig. 1. B/D ratios for the words stick and sleep and their 
syllable nuclei for speaker LS. The base word and the 
derivative forms are indicated on the vertical axis; the 
horizontal axis is calibra- d for ratios of duration of base 
word/ duration of the base part of the derived word. 
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Fig. 2. B/D ratios for the words stick and sleep and their 
syllable nuclei for speaker RG. The base word and the 
derivative forms are indicated on the vertical axis; the 
horizontal axis is calibrated for ratios of duration of 
base word/duration of the base part of the derived word. 
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Fig. 3. B/D ratios for the words speed and shade and their 
syllable nuclei for speaker RG. The base word and the 
derivative forms are indicated in the vertical axis; the 
horizontal axis is calibrated for ratios of duration of 
base wcrd/duration of the base part of the derived word. 
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Fig. B/D ratios for thw words speed and shade and their 

syllable nuclei for speaker LS . The base word and the 
derivative forms are indicated on the vertical axis; the 
horizontal, axis calibrated for ratios of duration of base 
word/duration of the base part of the derived word. 
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Comparison of Controlled and Uncontrolled Normal Speech Rate 
Richard Gregorski, Linda Shockey, and Use Lehiste 



Temporal studies have employed basically two methods for 
elicitation of speech rate: l) controlled , i.e., externally induced 
through the use of a pulsating beat, and 2) uncontrolled , i.e., 
internally generated by the subject with the instruction to maintain 
a constant rate. Peterson and Lehiste (i960) in investigating the 
influence of tempo on the duration of syllable nuclei had their 
subjects "speak in synchronism with a periodic pulse.” Lindblom 
(1963) used periodic clicks to manipulate speech rate in examining 
vowel reduction under varying temoos . Kozhevnikov and Chistovich 
(1969) in their experiment on the effect of rate on relative speech 
durations employed as a rate control a low-frequency periodic 
oscillation generator which was triggered by the subject 1 s initiation 
of articulation. However, in their experiment to determine the 
number of articulatory programs in a sentence of two syntagmas, no 
external, device was used to control rate; instead, the speaker was 
"instructed to adhere during all pronunciations to one and the same 
rate of speech.” In their experimental check of syllable command 
hypotheses using multiple repetitions of a sentence, the subjects 
performed the task first at a rapid rate and then at a slow rate; 
no external control appears to have been employed. Hooteboom and 
Slis (1969) in their speech rate study had their subjects freely 
choose their fast, normal, and slow rates. Lehiste (1970b) in her 
study of the temporal organization of monosyllabic and disyllabic 
words in English had her subjects maintain a "subjectively constant 
rate . ” 

To our knowledge, the comparability of the durations of speech 
units produced at a subjectively determined rate and those produced 
at a rate controlled by an external source has never been determined. 
If significant differences exist between temporal patterns occurring 
in speech produced by the two methods of elicitation, obvious 
questions arise. For example, to what extent could we then generalize 
about the temporal organization of speech from the previously 
mentioned studies executed with non-comparable methods? Would not 
the differences perhaps suggest two types of programming: l) a 
basic language program including speech-unit organization and natural 
rhythm information, and 2) a synchronization program whose task is 
to adjust the language program until its natural rhythm is synchronous 
with the external rhythm? 

It was the purpose of this experiment to determine the 
comparability of controlled and uncontrolled normal speech rate for 
both a sentence and a word spoken in isolation. Aggie was chosen for 
the word, and I bag Aggie , for the sentence. The major criterion 
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for selecting these utterances was their relatively segmentable structure 
when converted into oscillographic displays, and not their high 
semantic content. Two native speakers of English were instructed 
to produce both the word and the sentence about 150 times each at 
a comfortably constant normal rate. From recordings of these 
productions oscillograms were made by use of a Fr^kjaer-Jensen trans- 
pitch meter and an Elema-Schonander ' Mingograph (100 ram/sec). 

Durations of individual segments and pauses were measured to the 
nearest 1/2 millimeter (i.e., 5 milliseconds). The mean duration, 
standard deviation, variance, and coefficient of variation ( -jj— ) 
were computed using an IBM 360 computer for all possible combinations 
of adjacent segments. 

A Seth Thomas electronic metronome was used to implement the 
control method. To obtain the pulse rate for the controlled utterances, 
the mean duration for each speaker f s interstress interval for both the 
word and the sentence of the uncontrolled productions was converted 
into an equivalent pulse interval on the metronome. Since for both 
speakers the natural sentence stress fell on the /$/ of Aggi e , it was 
decided to synchronize the click with this stress. The speakers were 
instructed to repeat the production task, only this time synchronizing 
the /ae / of Aggie with the click of the metronome. The same segmenta- 
tion procedures and statistical analyses that were used for the 
uncontrolled utterances were applied to the controlled ones. The 
differences between the coefficients of variation of the controlled 
and uncontrolled sets were computed (see Tables I-VI in the Appendix). 

Figure I presents the coefficient of variation comparisons of 
Speaker PM 1 s controlled and uncontrolled Aggie spoken in isolation. 

There was an average difference of 2 % in the coefficients of variation 
for segments. Notice that there was no difference between the 
coefficients of variation of the stressed /as/'s; in absolute terms 
there was only a 10 millisecond difference in their mean durations. 

The syllables, word and word + pause likewise had average coefficient 
differences of about 2 %. There was a 6 % difference for the pauses. 

Figure II presents the coefficient of variation comparisons of 
Speaker LS*s controlled and uncontrolled Aggie . Her average coefficient 
differences for both segments and syllables were about 1 1/ 2 %. There 
was a . 3 % difference for the word. 

Figures III and IV present the coefficient comparisons for 
Speaker PM’s controlled versus uncontrolled sentences. Segments, 
syllables, and words as groups had average coefficient differences 
of 1-2 %. There was a 1 % difference for the sentence and a . 1 % 
difference for the sentence + pause. 

Figures V and VI present Speaker LS's sentence comparisons. 

Segments, syllables, and words as groups had average coefficient 
differences of 1 - 2 %. There was a 1 % difference for the sentence and 
a 3 % difference for the sentence + pause. 
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produced by speaker PM. 




PAUSE 

Figure II. Coefficient of variation ( x 100 ) comparisons 

of controlled versus uncontrolled speech-units for Aggie 
produced by Speaker L.S. 
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Figure III. Coefficient of Variation ( - 2 ~ x 100 ) comparisons 
of controlled versus uncontrolled speech-tlnits for I bag 
Aggie produced by speaker PM. 




Figure IV. Coefficient of variation ( — x 100 ) comparisons 
of controlled versus uncontrolled speech-units for I bag 
Aggie produced by speaker PM. 
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Figure V. Coefficient of variation ( x 100 ) comparisons 

of controlled versus uncontrolled speech-units for I bag 
Aggie produced by speaker LS. 




of controlled versus uncontrolled speech-units for I bag 
Aggie produced by speaker LS. 
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V. 



To test for the significance of these coefficient of variation 
differences , we assumed that if the same magnitude of difference 
exists between two uncontrolled sets and also between two controlled 
sets , then such differences cannot be attributed to the control 
technique. We divided both the controlled and uncontrolled sets 
into sequential halves of about 75 tokens each. The average 
coefficient differences between the uncontrolled halves and also 
between the controlled halves were comparable to those between the 
entire controlled and uncontrolled sets (see Table VII in the 
Appendix). It thus appears that these differences are due to the 
natural variability of speech in a repetition task and cannot be 
attributed to the use of the periodic beat. 

The controlled and uncontrolled sets were also examined for the 
direction of the differences between the coefficients of variation. 

We found no systematic direction to these differences for either 
speaker . 

We conclude that in repetitions of the same words and sentences 
spoken at a normal rate, the two methods described here produce 
comparable results. However, we want to emphasize that we make no 
claim regarding differences between controlled and uncontrolled speech 
produced at other rates or using other elicitation techniques . 
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Appendix 



TABLE I 

Coefficient of variation ( ) comparisons of uncontrolled versus 

controlled speech-units for Aggie produced by Speaker PM. 




Speech-unit 


Uncontrolled 


Controlled 


Difference 


Average 

Difference 


& 


.093 


.093 






g 


.158 


.190 


.032 


.022 


i 


.116 


.150 


. 03 ^ 




a S 


.063 


.080 


.017 


.022 


si 


.087 


.11*+ 


.027 


ag I 


.060 


.076 


r i 

• 

O 

H 

ON 


.016 


PAUSE 


.18k 


.121 


.063 


.063 


agi + 
PAUSE 


.069 


.050 


.019 


.019 



O 

tKJC 



152 



iv fr 



TABLE II 



Coefficient of variation ( — 2 _ ) comparisons of uncontrolled versus 

M 

controlled speech-units for Aggie produced by apeaker LS. 





Speech-unit 


Uncontrolled 


Controlled 


Difference 


Average 

Difference 




Hi 


.096 


.083 


.013 






8 


.157 


.136 


.021 


.013 




I 


. 1^9 


.155 


.006 




— 


ag 
9 J 


.089 


.067 


.022 


.016 


- 


. 09 ^ 


.103 


.009 




agi 


• 

O 

ON 

H 

> — 


l n 

I 

-4* 

VO 

O 

• 

1 1 


.003 


.003 




PAUSE 


.136 


.086 


.050 


.050 


- 


aagi +• 


.085 


.053 


.032 


.032 


- 


PAUSE 
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TABLE III 

Coefficient of variation ( ) comparisons of uncontrolled versus 

controlled speech-units' for I hag Aggie produced by speaker PM. 



Speech-unit 


Uncontrolled • 


Controlled 


Difference 


Average 

Difference 


a x 
t 

$ 1 

pi 

*2 

?2 

1 


.130 

.096 

.092 

.168 

.080 

.211 

.126 


.168 
.131 
.101 
.iVr 
.072 
.185 
.1 1*6 


.038 

.035 

.009 

.021 

.008 

.026 

.020 


.022 


aiL 

b$ 

*s n 

ga x 
ag 2 
9 1 


.087 

.067 

.081 

.070 

.085 

.093 


.119 

.083 

.081 

.061 

.071 

.nu 


.032 

.016 

.009 

.OlU 

.021 


.015 
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TABLE IV 



Coefficient of variation ( 



) comparisons of uncontrolled versus 



controlled speech-units f$r I bag Aggie produced by speaker PM. 



Speech-unit 


Uncontrolled 


Controlled 


Difference 


Average 

Difference 


U*ta 


.0T0 


.091+ 


.021+ 




Lag 


.063 


.072 


.009 




eye 


.051+ 


.057 


.003 


.011 


gag 


.077 


.062 


.015 




agi 


.066 


.068 


.002 




uxbeg 


.068 


.083 


.015 




bags 


.01+5 


.051+ 


.009 


.007 


sgag 


.057 


.055 


.002 




gagi 


.061 


.061 


— 




ullage 


.051 


.063 


.012 




be gag 


.01+9 


.052 


.003 


.007 


sgagi 


.01+9 


.051+ 


.005 




ctxbageg 
La gag i 


.05I+ 

.01+1+ 


.059 

.051 


.005 

.007 


.006 


uibagsg i 


.01+7 


.056 


.009 


.009 


PAUSE 


.176 


• 153 


.023 


.023 


Ullage g i + 


.050 


.051 


.001 


.001 


?PAUSE 
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TABLE V 



Coefficient of variation ( _ ) comparisons of uncontrolled versus 

controlled speech-units for I bap; Aggi e produced by Speaker LS. 



Speech-unit 


Uncontrolled 


Controlled 


— 

Difference 


Average 

Difference 


ax 


.107 


.175 


.068 




b 


.126 


.139 


.013 




«i 


.091 


.082 


.009 




3, 


.158 


.165 


.007 


.023 


fflo 


.086 


.081 


.005 




9 2 
i 


.159 


.198 


.039 




.11*0 


.157 


.017 




axb 


.068 


.109 


.01*1 




bffi 


.072 


.063 


.009 




»Sl; 

9® 


.070 

.065 


.057 

.056 


.013 

.009 


.013 


®S 2 


.076 


.072 


.ooi* 




9' 


.091 


.089 


.002 
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TABLE VI 



Coefficient of variation ( -Sp- ) comparisons of uncontrolled versus 
controlled speech-units for I bag Aggie produced hy speaker LS. 



Speech-unit 


Uncontrolled 


Controlled 


Difference 


Average 

Difference 


aibs 


.063 


.064 


.001 




bsg 


.059 


.052 


.007 




sgs 


.056 


.046 


.010 


.007 


S®S 


.060 


.059 


.001 




sg 1 


.071 


.056 


.015 




aibsg 


.055 


.056 


.001 




bsgs 


.048 


.042 


.006 


.006 


sgsg 


.054 


.047 


.007 




gag i 


.061 ; 


.050 


.011 




axbsgs 


.051 


.045 


.006 




bsgsg 


.048 


.044 


.004 


.009 


sgsg I 


.056 


.040 


.016 




uibsgsg 


.050 


.043 


.007 


.009 


bsgsg I 


.049 


.039 


.010 




azbsgsg 1 


.052 


.039 


.013 


.013 


PAUSE 


.145 


.102 


.043 


.043 


arbsgsgi + 


.074 


.045 


.029 


.029 


PAUSE 
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TABLE VII 

Coefficient of variation ( ) differences between various set 

comparisons of speech-units ror Aggie and I bag Aggie produced by 

speakers PM and LS . 
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Word Unit Temporal Compensation 
Linda Shockey, Richard Gregorski., and Use Lehiste 



The theory of temporal compensation is based on the assumption 
that temporal programming information for "chunks" of speech larger 
than one linguistic segment is utilized at some unspecified, but 
rather late, level in the speech production mechanism. It is assumed 
that language is programmed in units no smaller than those defined by 
traditional manner-of-articulation parameters. Further, it is 
assumed that the domain over which temporal information is specified, 
and therefore over which the durational interaction described below 
takes place, is a programming unit. 

This means that the duration of some multi segmental string of 
speech is fairly rigidly determined, and if this string or a stream 
of speech containing this string is repeated over many times at the 
same rate, the duration of the programming unit will remain very 
close to its average every time it is produced. But the same will not 
necessarily be true for the subparts of the programming unit. Since 
it is the duration of the higher-level unit which is predetermined, 
the durations of the individual segments are free to vary somewhat, 
as long as their sum approximates very closely the duration of the 
higher unit. The extent to which segments can vary is postulated to 
be determined by external factors such as whether or not segmental 
duration is contrastive in the language being considered. 

Slis (1968) noted such a compensatory process in Dutch. He 
found that the lengths of several words of a given number of segments 
were quite similar despite substitution of segments with different 
intrinsic durations. A more sophisticated mathematical technique 
for testing for temporal compensation has been used by Kozhevnikov 
and Chistovich (1965) for Russian and by Ohala (1970), Allen (1969) 
and Lehiste (1970) for English. 

The latter technique involves measurement of segments and deter- 
mination of their variances and of correlation coefficients between 

* 

adjacent segments and groups of segments. The assumption is that if 
there is little or no correlation in duration between adjacent 
segments, then at some level each segment is programmed separately. 

If so, the variance of the whole utterance or of any subpart of it 
should be equal to the sum of the variances of the individual segments. 
If an utterance is programmed in terms of more than one segment, we 
expect negative correlations between subparts of %he largest program- 
ming unit; that is, if one part is longer than average, another part 
will be shorter than average to allow the duration of the programming 
unit to come quite close to its own average. If a negative correlation 
is found, it should also be true that the sum of the variances of the 
subparts of the utterance is greater than the variance of the 
programming "chunk." 
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In her 1970 experiment, Lehiste found that negative correlations 
exist between subparts of mono- and disyllabic words in English. The 
experiment to be reported was designed to discover whether temporal- 
compensation operates within word-size units when they are included 
in a sentence. 



Methods 



Two subjects, PM and LS, both graduate students at The Ohio 
State University, were used. Each was seated in an I.A.C. sound- 
treated chamber with a high-quality Ampex microphone about one foot 
from his mouth. Cards with the utterances we wished to elicit written 
on them were placed on a table in front of the subject, one at a time. 
The subjects were asked to repeat a given utterance at a steady, 
comfortable rate of speech until signalled to stop. Each utterance 
was repeated 150 times or more. Recordings were made on an Ampex 
350 magnetic tape recorder at a speed of 7 1/2 i.p.s. 

One word, "Aggie," and one sentence, "I bag Aggie were 
recorded by both subjects. In addition, speaker PM recorded the 
word "Agatha" and the sentence ”l saw Agatha." These utterances were 
chosen on the basis of potential segmentability . 

The recordings were then processed through a Fr?$kjaer-Jensen 
Trans-Pitch meter and recorded in the form of a duplex oscillogram 
by an Elema-Schonander Mingograf at a speed of 100 mm. /sec. The 
oscillograms were segmented following the standards set forth in 
Naeser (19&9). The duration of each segment was measured, with an 
accuracy estimated to be to the nearest 1/2 mm. or 5 msec. 

Both Ohala (19^8) and Kozhevnikov and Chistovich ( 1965 ) used 
normalization procedures involving choosing out of their total set 
of data a group of utterances of highly similar duration, to eliminate 
possible effects of differences in rate. Following their precedent, 
we have based our conclusions on the 50 utterances closest to the 
mean for each utterance and each speaker, in the belief that only 
when variability of duration of the entire utterance is carefully 
constrained can small variations within the utterance be examined 
meaningfully . 

The results were processed by an IBM 360 computer. Statistical 
measures derived were mean duration, standard deviation, variance, 
relative variance ( ), coefficient of vy*iation ( ) and 

Pearson correlation coefficient r = 

Statistical tests were run on the following segments and 
combinations of segments: l) individual segments with each other, 

2) all possible combinations of n segments, with each other and 
with other combinations of n segmepts , where n ranges from 2 to the 
number of segments in the utterance minus one, and with the provision 
that the two sets being tested for correlation have no segments in 
common. When n > 2, only adjacent sets of three, four, etc. are 
used. 3) individual segments with sets of n segments. In addition, 
measurements were made of the pauses between the utterances and 
correlations were calculated between utterances and pauses . 
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Results 



1. Of standardization. 

We found that standardization of "rate” in this very restricted 
sense gave us a much clearer picture of which segments and combinations 
of segments were interacting with each other than we could have formed 
by looking at the complete set of 150 tokens . Following is a chart 
showing numbers of significant negative correlations found in the 
largest group and in the subset of 50 for the two sentences: 



TABLE I 

numbers of significant negative correlations at the .01 level before 

and after normalization 





"I bag Aggie" 


"I sass Agatha" 


PM 50 


55 


9^ 


150 


1 


^5 


LC 50 


k2 




150 


12 





2. For words in isolation: 

In the majority of cases, there were significant negative 
correlations (at the .01 level) between adjacent segments in the word 
"Aggie" for both speakers . Although negative correlations were present 
in all cases between adjacent segments in the word "Agatha" as spoken 
by PM, all except one were below the .01 level of significance. Higher 
negative correlations, predominantly significant, were found when 
larger portions of the word were tested, the highest negative correla- 
tion coefficient values being for mutually exclusive subsets of the 
whole word , e . g . Dege-do 3 . 

Typical results are presented graphically in Fig. 1. Tables containing 
additional information on mean, standard deviation, variance, relative 
variance, and correlation coefficient are to be found in the appendix. 

3. For sentences: 

Correlations between adjacent segments in the sentences "I bag 
Aggie" and "I sass Agatha" were all similar to those for the word "Agatha", 
negative, but tending to be below the .01 level of significance. However, 
note in the following graph (Fig. 2) that for botn speakers and both 
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Figure 1. 
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Figure 2. 
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sentences there is quite a strong negative correlation between the 
last two segments (this may indicate a tendency for temporal 
adjustment to take place utterance-f inally ) . 

For speaker PM, there is a tendency to have stronger correlations 
between units of larger sizes, the largest being between mutually 
exclusive subp ar ts of the whole sentence. This does not hold for L5, 
although her correlations between segments are consistently smaller 
on the average than her correlations between larger elements (see 
Figs . 2 and 3) . 



Figure 3. 
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For speaker PM there are two significant negative correlations 
between subparts of the word "Aggie” when it is included in the 
sentence "I bag Aggie," both in the low range (see Table 
Appendix). For LS there were three (Table 5), two of then being the 
highest negative correlations for this speaker and this sentence. 

For speaker PM, the same tendencies hold for "Agatha” when it is 
included in the sentence "I sass Agatha" — there are seven significant 
negative correlations between elements of the word, but higher negative 
correlations result from testing larger portions of the sentence with 
no consideration for traditional word boundaries. 

h. For utterance and pause: 

As may be seen from the following table, very high negative 
correlation coefficients were found for all tested utterances compared 
with the following pause: 



TABLE 2 

Correlation coefficients for whole utterance and following pause. 



... 



Speaker ■ 


"Aggie" 


"Agatha" 


"I bag Aggie" 


"I sass Agatha" 


PM 


i 

• 

oo 

oo 


-.820 


CO 

ON 

• 

1 


-.901 


LS 


-.710 




-.828 





Discussion : 

The most obvious conclusion to be reached from these data is that, 
if we do indeed have a legitimate means of detecting temporal compen- 
sation in examining variation and correlation coefficient, temporal 
comepnsation occurs in a high degree between portions of these short 
utterances. It would appear, then, that at some level the. entire 
utterance is programmed as a whole, since all segments and combinations 
of segments play a part in this temporal interaction. 

For speaker PM we find no convincing evidence that the words 
"Aggie" and "Agatha" maintain integrity as units when embedded in a 
longer context. For speaker LS, we find that although the parts of the 
word "Aggie" do definitely interact temporally with the rest of the 
utterance, the most regular negative correlation is between parts of 
the word. Thus there is some possibility that for this speaker, a 
strategy involving word-units is employed. However, it seems equally 
likely that there is a non-causal relationship between the facts that 
there is a high negative correlation between ZsgJ and C f 3 for speaker 
LS and that CaegiD can be an utterance by itself. Further studies will 
be needed to disambiguate these data. 

In the present study, lexical words did not emerge as units within 
which temporal compensation takes place. Rather, they seemed to be 
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merged into a phonological phrase, losing their separate identity. 

It thus seems unlikely that the "word” level will prove as useful in 
phonological description as it does for lexical and syntactic 
description. It is not inconceivable that temporal compensation may 
serve to determine the extent of linguistic units at a higher level, 
such as a phrase or breath-group. The next step, of course, is to 
examine utterances containing an embedded sentence or phrase for units 
displaying internal cohesion. 

Much research has been based on the hypothesis that the syllable 
is the basic unit of speech production, especially by Stetson (1951 ) 
and Kozhevnikov and Chistovich ( 1965 ). There is some evidence from 
electromyography that this may be so at some level in articulatory 
programming (MacNeilage and DeClerk, 1968 ) . But our results show a 
singular lack of evidence for postulating either the CV or VC syllable 
as a basic unit of temporal programming for English. There are, for 
all of our sets of data spoken at a very similar rate, various degrees 
of negative correlation between most adjacent segments with no clear 
indication of a stronger bond between CV or VC sequences. 

We agree with Ohala’s statement that "Chistovich and her colleagues 
took the units Cof speech production^ to be syllables based on the 
results of a previous experiment, in which it was shown that the 
duration of the words and syllables relative to the duration of the 
whole utterance remained constant during changes of rate, but the 
relative durations of the consonants and vowels, the components of the 
syllable, varied during changes of rate. Thus the smallest interval 
maintaining relative temporal "integrity" in the face of changes in 
rate was the syllable — at least in Russian. But these results could 
as well be taken as indicating that the articulatory unit could be nc 
smaller than the syllable but it could be larger (p. 1^5)- M 

While it is undeniable that the syllable plays a significant part 
in speech rhythm and may at some level be a measure of speech units , 
we find no evidence for postulating it as a primary building block, 
in English, at the level of programming which we presume to be 
observable through the process of temporal compensation. 

The amazingly high negative correlations between the speech 
portions of our data and the following pauses reflect the high 
accuracy with which our subjects were able to execute the request to 
speak at a steady rate. We realize that the speech situation which we 
have created is artificial in that it is conducive to a measured 
rhythm; however we still feel that it is interesting to note that in 
all probability the pause is programmed with the speech as a temporal 
unit. The internal programming of the utterance itself apparently 
takes place at one level; at the next higher level, the unit of 
programming is the sentence plus the following pause. This may 
indicate, as suggested by Ohala (1970), that the mechanism f or 
isochrony is indeed part of the linguistic competence of the speaker 
of English. 
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Appendix 



These tables are to be read as follows : 

The (a) tables indicate the mean, standard deviation, variance, 
and relative variance for each of the variables to be used when 
testing for correlation. 

The (b) tables show correlation coefficients, ordered from lowest 
to highest for each utterance. On either side of a hyphen are the 
two variables being considered. Notice that a variable may contain 
any number of segments and that the correlations represented are 
between the two variables on either side of the hyphen taken as 
units . Therefore, if you see aib-®gl, this means we are considering 
CaxbD as a unit in this particular comparison and plotting its 
durational values against those of the "unit" CaglD. 

Since the means, standard deviations, etc. of the sum of the items 
being compared are always identical with the same information for one 
of the variables when we are dealing with words Cffl+gi = sg ID and since 
the same is often true when we are dealing with sentences CaTb- $g= aubagD 
this information is left ouf of the table when there is overlap. 

Notice that the comparisons whose sum is not equal to one of the 
variables involve non-adjacent elements. About 1/2 of PM's significant 
negative correlations are for non-adjacent units, but for LS only two 
sore. This may be further evidence for a difference in programming 
strategy. 

Formulae for statistical variables are: 




Mean (M): x 

N 



standard deviation: a 




N 



Variance: V - a 2 



Relative Variance: 



V 

M 



Variation coefficient: 



a_ 

M 
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TABLE 3 

Speaker PM: ’’Aggie” 





r 


M 


a 


a 2 


Relative 

Variance 


Variation 

Coefficient 


(a) Variable 
ae 




169.20 


13.05 


170.36 


1.007 


.077 


9 




64.80 


11.44 


130.96 


2.021 


.177 


i 




169.20 


11.63 


135.36 


.800 


.069 






234.00 


11.09 


123.00 


.526 


.047 


gi 




234.00 


13.08 


171.00 


.731 


.056 


cB+ j 




338.40 


13.29 


176.56 


.522 


.039 


ag I 




403.20 


5.81 


33.81 


.084 


.014 


(b) Variables 
S- i 
sa_ 1 
a_g 
ag- i 
s+ i-g 
a~g i 


-.358 

-.425 

-.597 

-.870 

-.900 

-.901 













TABLE 4 

Speaker LS : ’’Aggie" 





r 


M 


a 


a 2 


Relative 

Variance 


Variation 

Coefficient 


(a) Variable 
a 




237.40 


16.47 


271.25 


1.143 


.069 


9 

i 




89.30 


14.42 


208.02 


2.329 


.162 




165.70 


19-93 


397.01 


2.396 


.120 


ag 




326.70 


19.74 


389.69 


1.193 


.060 


9 * 




255.00 


16.82 


283.00 


1.110 


.066 


a+ i 




403.10 


16.74 


280.06 


.695 


.042 


ag i 




492.40 


8.74 


76.38 


.155 


.018 


(b) Variables 

g-i 

a-l 

1-g 

a-gl 

djcDg— ! 


-.560 

-.591 

-.853 
- 862 
-.903 







■ 


L 2 i 
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TAjtLE 5 

Speaker PM : ’’Agatha” 






■>* 




r 


M 


O 


a 2 


Relative 

Variance 


Variation 

Coefficient 




(a) Variables 
















cfi 




100.50 


15.21 


231.25 


2.301 


.151 




c 




1+5. 1+0 


9.61+ 


92.81+ 


2.01+5 


.212 




e 




67.50 


9.1+5 


89.25 


1.322 


.11+0 




d 




92.70 


10.78 


116.21 


1.251+ 


.116 


- 


3 




11+0.50 


16.13 


260.25 


1.852 


.115 








11+5.90 


16.58 


271.70 


1.862 


.113 




ge 




112 . 90 


11.18 


125.09 


1.108 


.099 


c ^ 


©d 




160.20 


10.39 


107.96 


.67I+ 


.065 




d© 




233.20 


18.89 


356.76 


1.530 


.081 


- 


=ege 




213.1+0 


17.65 


311.1+5 


1.1+59 


.083 




ged 




2C5 . 60 


12.61+ 


159.61+ 


.776 


.061 




©do 




300.70 


17.06 


291.06 


.968 


.057 




aaged 




306.10 


16.75 


280.1+1+ 


.916 


.055 




asgede 




1+1+6.60 


5.71 


32.63 


.073 


.013 




(b) Variables 








— — — — — — - 








aeg©-d 


-.387 














S j — d © 


-Mb 














go-d© 


-.1+70 












— 


©1 — d 


-.1+79 














c6— ©2 


-.757 














a-d© 


-.761 












... 


ag_de 


-.772 












' 


asg_© 2 


-.810 












' 


a— ©d© 


-.828 














ag©—©2 


-.858 












■* 


aged_© 


-.91+0 












. 


ag_©Ue 


-.91+3 












. 


age_d© 


-.953 


I 
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TABLE 6 

Speaker PM: ”l bag Aggie” 



„ 




r 


M 


a 


' a 2 


" ^ 

Relative 

Variance 


1 Variation 
Coefficient 


(a) Variables 
ui 




100.70 


7.75 


60.01 


• 596 


• 077 


b 




70.1+0 


6.1+7 


1+1.81+ 


.591+ 


.092 


a 




136.90 


10.58 


111.89 


.817 


.077 


s 




51.10 


6.02 


36.29 


.710 


.118 


a 




159.1+0 


11.91+ 


11+2.61+ 


• 895 


.075 


S 




56.30 


10.09 


101.81 


1.808 


.179 


i 




150.30 


12.51 


156.1+1 


l.Ol+l 


.083 


(2 2b 




171 • 10 


10.06 


101.30 


• 592 


.059 


b$ 




207.30 


n.71 


137.22 


.662 


.057 


®c 




188.00 


12.12 


11+7.00 


.782 


.061+ 


3® 




210.50 


n.67 


136.25 


.61+7 


.055 


*3 




215.70 


lk. 66 


215.01 


.997 


.068 


3 * 




206.60 


11.98 


11+3.1+5 


.691+ 


.058 


alb® 




308.00 


1I+.1I+ 


200.00 


.61+9 


.01+6 


aiteg 




359-10 


15.71+ 


21+7.81 


.690 


.01+1+ 


uibsga 




518.50 


11+.53 


211.25 


.1+07 


.028 


atibagag 


i 


571+.80 


15.59 


21+3.19 


.1+23 


.027 


a^.b$s$g i 




725.10 


8.61 


7I+.O6 


.102 


.012 


bag 




258.40 


13.13 


172.50 


.668 


.051 


bag® 




1+17.80 


11+-37 


206.38 


.1+91+ 


.031+ 


bagag 




1+71+.10 


15.36 


235.88 


.1+98 


.032 


bagag i 




62U.1+0 


11.23 


126.06 


.202 


.018 


aga 




31+7.1+0 


1U.98 


22U.31 


.61+6 


.01+3 


®g»G 




1+03.70 


16.1+0 


268.88 


.666 


.01+1 


a gag 1 




551+.00 


12.17 


11+8.00 


.267 


.022 


g»g 




266 . 80 


13.89 


192.88 


.723 


.052 


gagi 




1+17.10 


13.50 


182.25 


.1+37 


.032 


ag i 




366.00 


lU.l+9 


210.00 


.57^ 


.0U0 


(b) Variables 
g-ag i 


-.36? 


1+17-10 


13.50 


182.25 


.1+37 


.032 


ar-®2 


-.378 


260.10 


11.52 


132.63 


.510 


.01+1+ 


arba-g® 

$ 2i -i 

aiba-gag 

aiba-ag 


-.379 

-.381 


338.30 


13.70 


187.75 


.555 


.01+1 


-.381 

-.339 


523.70 


15.93 


253.88 


.1+85 


.030 


azrba-® 


-.U03 


1+67.1+0 


11+.37 


206.38 


.1+1+2 


.031 


D-®g®g i 
uib-gag 


-.1*05 

~.h08 


1+37.90 


13.1+2 


180.19 


.1+11 


.031 


axbag-T 


-.Ul5 


509.1+0 


15.52 


21+0.75 


.1+73 


.030 


®l-i 


-.1+16 


287 . 20 


12.58 


158.25 


• 551 


.383 


bag® - i 


-.1+19 


568.10 


1U.5 6 


212.06 


.373 


.026 


aib-aa 


-.1+28 


386.80 


13.78 


189.91+ 


.1+91 


.036 



eric 
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TABLE 6 (continued) 





r 


M 


a 


a 2 


Relative 

Variance 


Variation 

Coefficient 


(b) Variables 
aib- ga 


-.430 


381.36 


11.69 


135.56 


.358 


.031 


aegcEL. i 


-.434 


497.70 


14.78 


218.31 


.439 


.030 


aBg 2 - J 


-.440 


366.00 


l4 .49 


210.00 


• 574 


.040 


axba_ i 


-.446 


458.30 


14.10 


198.81 


.434 


.031 


baa— g I 


-.447 


413.90 


12.46 


155.38 


.375 


.030 


92 1 


-.445 


206.60 


11.98 


143-45 


.694 


.058 


bag-gl 


-.467 


465.00 


13.00 


169.00 


.363 


.028 


ai-ag 1 


-.472 


502.90 


13.31 


177.25 


.352 


.026 


a:rb-a 2 


-.473 


330.50 


11.41 


130.25 


.394 


.035 


uib*g--®g 

aubag-a 

si-gl 


-.476 

-.477 

-.482 


343.50 


11.54 


133.25 


.388 


I -034 


as x -gl 


-.494 


394.60 


12.12 


147.00 


• 373 


.031 


a- gag i 
uxbae-g i 


-.513 

-.539 


514.60 


12.69 


lol.OO 


.313 


.025 


axbag_g i 


-.541 


565.70 


13.68 


187.13 


• 331 


.024 


ai-gag I 


-.549 


517.80 


11.29 


127.38 


.246 


.022 


ba-«g i 


-.564 


573.30 


12.48 


155.88 


.272 


.022 


ux-ag i 


-.567 


466.70 


11.95 


142.69 


.306 


.026 


axbaga- i 


-.579 


668.80 


12.53 


157.06 


• 23^ 


.019 


sg-agl 
cab- gag i 


-.595 

-.596 


588.20 


11.03 


121.63 


.207 


.019 


cur-agag} 
axb-ag 1 


-.596 

-.597 


537.10 


11.72 


137.25 


.256 


.022 


aga-g! 
al-bagag } 
baga-g ! 

, beg-agl 
a gag- ! 
bagag-l 
axb-agag i 
uxba-ag i 
axbaga-g i 
axba-gag i 
oabageg- f 
uxbag-ag i 


-.613 

-.644 

-.650 

-.674 

-.676 

-.693 

-.716 

-.783 

-.806 

-.807 

-.835 

-.841 

i 








, 


i 
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TABLE T 

Speaker LS: "I bag Aggie" 





r 


M 


0 


a 2 


Relative 

Variance 


Variation 

Coefficient 


(a) Variables 
C a 


| 

! 


IU5.IO 


12.67 


160.50 


1.106 


.087 


1 . 

u 




58.6c 


7.55 


57.01+ 


.973 


.129 


e 


1 


169.IO 


15.80 


21+9.70 


1.1+77 


.093 


s 




57-90 


9.17 


81+.09 


1 .1+52 


.158 


S 




197.20 


12.50 


156.16 


.792 


.063 


a 




60 . 30 


12.06 


11+5.41 


1.811 


.150 


i 




127.70 


13.9k 


19k,2l 


1.521 


.109 


! , , -r ; . 


1 


203.70 


11 . 74 


13T . 81 


.677 


.058 


bs 




227.70 


1 15.75 


21+8.21 


1.090 


.069 


& 9 i 




227.00 


1 15.10 


228.00 


1.001+ 


.067 






i 255.10 


12.1+7 


155.50 


.610 


.Ql+9 




! 


277.50 


15.0O 


21+3.25 


.877 


.056 


s i 


; 


208.00 


11.36 


129.00 


.620 


.055 


uA- La. 


: - 
i 


372.80 


16.35 


267.31 


.717 


.Okl+ 


uibag 




1+30.70 . 


1I+.9I+ 


223.06 


.518 


.035 


uX bcftoa 




627.90 


15.11 


228.1+1+ 


.36k 


.021+ 


Ulbayaig 




708 . 20 


17.19 


295.38 


.kl7 


.021+ 


ubCLagag i 




835.90 


11+.73 


216 . 9k 


.260 


.018 


tag 




285.60 


ll+.l? 


200.75 


.703 


.050 


baegae 




1+82.80 


16.53 


273.38 


.566 


.03k 


basgac 




563.IO 


16.82 


288.06 


.503 


.030 


ba^g i 




690.80 


11+.37 


206.56 


.299 


.021 


agae 




1+21+.20 


17.16 


29k. 1+1+ 


.69k 


.oko 


agag 




50U.50 


17.50 


306.25 


.607 


.035 


agffig i 




632.20 


16.06 


257.9k 


.k08 


.025 


gag 




335.1+0 


11+.86 


220 . 9k 


.659 


.okk 


gagi 




1+63.10 


12.57 


158.06 


.3kl 


.027 


*gi 




1+05.20 


12.81 


16k. 00 


.k05 


.032 


(b) Variables 
0.1- La 
urb-agsg 
bag-ag 2 

*l~Sl 
Ciltceg-ag2 
b-agi 
g i —^2 

Sl- a 'S2 
« l-g*g i 
Gl-aagi 
0 . 1 - bag 
ujlba-gag 


-.35k 

-.362 

-.36k 

-.365 

-.36? 

-.310 

-.370 

-.372 

-.377 

-.383 

-.385 

-.397 








' 
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TABLE 7 (continued) 
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TABLE 8 

Speaker PM: "I sass Agatha" 






jT^ 


M 


a 


a 2 


Relative 

Variance 


Variation 

Coefficient 


(a) Variables 

ut 




103-90 


8.56 


73.29 


.705 


.082 


5 




94.20 


5.95 


35.36 


-375 


.063 


S 




125-00 


10.00 


100 . 00 


.800 


.080 


s 




82.10 


5.39 


39.09 


-354 


. 066 


33 




134.00 


8.49 


72.00 


.537 


.063 


g 




43.00 


7.21 


52.00 


1.209 


.168 


O 




57-40 


8.14 


66.24 


1.154 


.142 






82.10 


9.60 


92.09 


1.122 


.117 


© 




137-70 


11.23 


126.21 


.917 


.082 


UTS 




198.10 


8.99 


80.89 


.408 


.045 


as 




207-10 


10.91 


119-10 


.575 


.053 


sg 




177-00 


10.34 


107.00 


.605 


.058 


3* 




100.40 


9-27 


85-84 


-855 


.092 


©J 




139-50 


9.81 


96.25 


.690 


.070 


ue 




219-80 


11.04 


121.97 


-555 


.050 


s»i 




219-20 


10.65 


113.36 


-517 


.049 


s &2 




216.10 


8.90 


79-30 


.367 


.04l 


SSBS 




301.30 


12.20 


148.94 


.494 


.04l 


ag©Jo 




454.20 


12.59 


158.44 


.349 


.028 


Us SfflS 




405-20 


13-27 


176.00 


.434 


-033 


UTagetfe 




558.10 


12.09 


146.06 


.262 


.022 


Sasag©t3© 




755-50 


9-06 


82.13 


.109 


.012 


Uxsasageo© 




859-40 


5.04 


25-38 


.030 


.006 


uisa 




323-10 


12.08 


146.00 


.452 


.037 


oxsas 




405.20 


13.27 


176.00 


.434 


.033 


ussasa 




539-20 


10.32 


106.44 


.197 


.019 


uzsasag 




582.20 


11.70 


137.00 


.235 


.020 


uxsasag© 




639.60 


11.36 


129-00 


.202 


.018 


UTSasagOd© 




721.70 


12.00 


144.06 


.200 


.017 


S&ScB 




435-30 


10.42 


108.63 


.250 


.024 


sasag 




478.30 


11.74 


137.81 


.288 


.025 


sasag© 




535-70 


11.83 


l4o.o6 


.261 


.022 


sasagoj 




617.80 


14 ‘l4 


199-88 


.324 


.023 


sasag©<j© 




755.50 


9-06 


82.13 


.109 


.012 


diSdt 




341.10 


9-61 


92.44 


.271 


.028 


asag 




384.10 


10.04 


100 . 81 


.262 


.026 


asago 




441.50 


11.32 


128.25 


.290 


.026 


a sagoo 




523.60 


12.82 


164.25 


.314 


.024 


asagou© 




661.30 


8.16 


66.63 


.101 


.012 


sag- 




259.10 


11.65 


135-81 


.524 


.045 


sag© 




316.50 


12.18 

! 


148.25 


.468 


.038 
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TABLE 8 (continued) 



(a) Variables 

saged 

sag©d© 

ag© 

aged 

g©d© 

©d© 



(b) Variables 
sai-aged 
urss_ ye 
sasag_d© 
s 2 -©i 
ui-sssaeg 
ui_i>aged 
uxsas-d 
as -©u© 

g— ©y© 
uxs-asa 
aisasag-©d 
aisasag-©! 

Q5Q caa a 
wu3 sos — 17 ^ 

ais-asag© 

©j-y 

ui-ag©u& 

ai-ag©tf 

sagged© 

s 2 -ai: +ag©d© 

oxs-sag©d© 

sag-©y© 

ax-sasag© 

oxsasa -© 2 

s 2 -©y© 

sas-ed© 

ar_sasa 

uts-ou© 

ai-g©u© 

uis-asagotf 

a i-saged 

sa-sa 

aj-ag© 

asag-o© 

y-©2 

sa-sag 
ar-sag©u© 
a ^ -a 2 



27 


M 


a 


r " ■ " 

a 2 


Relative 

Variance 


Variati< 

Coeffic: 




398.60 


13.57 


184.19 


.462 


.034 




536.30 


11.88 


l4l.06 


.263 


.022 


i 


234.40 


12.31 


151.64 


.647 


.053 




316 . 50 


13.65 


186.25 


.588 


.043 




320.20 


9. 95 


99.00 


.309 


.031 




277-20 


10.16 


103.19 


.372 


.037 


-.356 


535.70 


14.00 


196.06 


.366 


.026 


-.362 


542.90 


13.09 


171.25 


.315 


.024 


-.363 


698.10 


12.87 


165.69 


.237 


.018 


-.365 

-.369 


139-50 


7-95 


63.25 


.453 


.057 


-.375 


502.50 


13.05 


170.25 


.339 


.026 


-.380 


625.00 


13 .66 


186 . 50 


.298 


.022 


-.384 

-.384 

-.387 

-.388 

-.390 


484.30 


11.72 


137.25 


.283 


.024 


-.392 

-.393 

-.397 


573.00 


11.96 


143.00 


.250 


.021 


-.397 


558.10 


12.09 


146.06 


.262 


.022 


-.403 


441.50 


13.28 


176.25 


.399 


.030 


-.4o4 


539.40 


11.26 


126.75 


.235 


.021 


-.408 


640.20 


11.04 


121.94 


.190 


.017 


-.412 

-.4l4 

-.416 


734.40 


11.57 


133.88 


.182 


.016 


-.419 


676.90 


11.64 


135-50 


.200 


.017 


-.421 


359.30 


9.28 


86.19 


.240 


.026 


-.423 

-.423 


578.50 


12.13 


147.25 


• 255 


.021 


-.424 


475.30 


10.33 


106.63 


.224 


.022 


-.437 

-.438 

-.442 


445.20 


10.59 


112.06 


.252 


.024 


-.443 


359.40 


11.91 


141.75 


.394 


.033 


-.446 

-.446 

-.447 

-.449 


603.90 


11.13 


123.81 


.205 


.018 


-.454 


640.20 


11.04 


121 . 94 


.190 


.017 


-.454 


259.00 


9.75 

1 


95-00 


.367 


.038 
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TABLE 8 (continued) 





>* 


M 


O 


a 2 


| Relative 
Variance 


Variation 

Coefficient 


(b) Variables 
sas^g©y© 


-.458 


621 . 50 


11.69 


136.75 


.220 


.019 


ur-aseg©y 


-.460 


627.50 


11.68 


136.50 


| .218 


.019 


a sag- 02 


-.46 1 


521 . 80 


11,09 


123.00 


i .236 


.021 


S3 S3 Cj-- ^2 


-.469 


616 . 00 


11.85 


140.44 


j . 228 


.019 


saj-sag© 

©j-vJ© 

as-ageu 

sag©-0© 

sai_ag 2 


-.469 

-.473 

-.474 

-.480 

-.487 


453.60 


11.71 


137.19 


i 

t 

.302 


.026 


(iT ss sa g- © 2 


-.488 


719.90 


11.61 


134.88 


.187 


.016 


cBj-sag© 

a sag ©- ©2 


-.493 

-.499 


579.20 


11.29 


127.44 


.220 


.019 


sa_a 2 


-. *)1 


353.20 


9.74 


94.81 


.268 


.016 


ag©y-© 2 

as-gsj© 


-.502 

-.514 


527.30 


10.32 


106.44 


.202 


.020 


as-ag© 

aisas-ag 

as_a 2 

arsasagey 

©y-© 2 

ses_a 2 

s»sago-© 2 


-.530 
-.532 
-.533 
- .533 
-.541 
-.542 
-.552 


673.40 


10.93 


1 

119.50 


.177 


.016 


aisa-sa 2 

sag©y-© 2 

axsa_sag© 

ttis»-s®g©y 

»i-s»s 

aisa-ag 


-.553 

-.555 

562 

-.568 

-.579 

-.581 


500.10 


10.38 


107.69 


.215 


.021 


oxsa_a 2 


-.584 


457.10 


9.91 


98.25 


.215 


.022 


axsasag©_© 2 


-.587 


777.30 


10.27 


105 . 50 


.136 


.013 


uxsas-agou 
32 1-32 g 


-.603 

-.604 


302.00 


9.06 


82.00 


.272 


.030 


sasa-g©y© 

arsas-ag 

axsas-©y© 


-.605 

-.608 

-.609 


682.40 


10.72 


115.00 


.169 


.016 


axsas-a 2 

uisas-gey© 


-.629 

-.639 


725.40 


10.31 


106 . 31 


.147 


.014 


azsa-g©y© 


-.652 


66 l . 30 


8.16 . 


66.63 


.101 


.012 


a 2 -ag©y© 


-.653 


579.20 


9-70 


94.13 


.163 


.017 


asag-eo© 

sa_sag©y© 

sasago-u© 

sas-ag©o© 

a j _s®ge<JG 

asage_ue 


-.613 ‘ 
-.681 
-.688 
-.733 
-.734 
-.734 
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1 
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TABLE 8 (continued) 





r 


M 


a 


a 2 


Relative 

Variance 


Variation 

Coefficient 


("b) Variables 














aj-ai +ag©e© 


-.736 


683.10 


8.25 


68.06 


.100 


.012 


ai-assg©tf© 


-.759 


765.20 


5.81 


33.81 


.Okk 


.008 


sasagsd-e 


-,768 






111. 81 






as ag©e-e 


« .777 












ors-asagetfe 


w.832 












ox-sassgetfe 


-.838 












aisa-sBgotfe 


-.863 


777.30 


6.1+7 


111. 8l 


.O5I1 


.008 


oisasa-getie 


-.877 












axsasag-tt© 


-.899 


859.1*0 


5. Oil 


25.38 


.030 


.006 


arsas»g-©£e 


-.903 












aisasage©-© 


-.908 












oxsa-sagstf© 


-.912 












arsas-ag©9© 


-.925 
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Relative Intelligibility of Five Dialects of English 

I 

jr 

Mary Virginia Wendell 



In the production of the various linguistic Atlases of the English 
language, numerous word lists and phonetic descriptions have been made 
of the many regional and social dialects of English, with ’’dialect” 
being defined as special varieties of usage and/or pronunciations 
within the range of a given linguistic system. (Reed, 1967 , P* 2) 

Thus, a language may be considered to be a collection of related 
dialects in a particular area, often encompassing a single nationality. 
Carroll Reed ( 1967 , p. 2) has said, ’’Languages are not mutually 
intelligible; different dialects of the same language are ordinarily 
mutually irtelligible (with some notable exceptions , such as certain 
dialects of Chinese).” The purpose of this study is to determine how 
intelligible certain dialects of English are to native speakers of one 
particular dialect. 

In a study by L. S. Harms (1961), listeners of three status groups 
attempted to reconstruct spoken messages of speakers of the three 
statuses. Listeners achieved highest comprehension scores when speaker 
and listener status were the same. In the present study, this result 
has been modified to include unintelligibility of regional dialects . 

Five dialects of English were presented to listeners who were native 
speakers of one of the dialects . Highest intelligibility scores were 
expected when speaker and listener dialect coincided. Since no other 
data in relative intelligibility of the five dialects involved in the 
study are available, no prediction was made as to the most difficult 
dialect to understand. 

Dialects for the study were chosen on the basis of their differences 
from the control dialect, which was that of Columbus, Ohio. At least 
two of the speakers chosen demonstrated idiolectal differences, but the 
speakers were selected because their speech patterns were very close 
to those of the dialects they represented, and quite different from 
those of the control dialect. 

The dialects chosen for the study were: Columbus, Ohio, an example 
of General American speech; Long Island, New York, Jewish community; 
Portsmouth, Ohio, an example of what can be called Rural Southern Ohio 
speech, a mixture of General American and Southern speech; one variety 
of British stage speech; and Black American, (urban variety of this 
dialect, rather than what is known as Southern Negro speech). No 
attempt was made to investigate intelligibility of dialectal words and 
sentence patterns. The test which was used examined only word 
intelligibility, i.e. pronunciation differences. 

Brief descriptions of the dialects follow. All are taken from 
C. M. Wise’s Applied- Phonetics (1958). Only the more prominent 
features are listed with particular attention to those characteristics 
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which are applicable to the listening test items. 

General American Speech is characterized by the following pronuncia- 
tions : 

1. Co 3 is the most common low back vowel except in words like 
water , s orrow , not and possible , where the vowel is Ca3. 

2. C$3 and C6 3 are in free variation in words like care and dear. 

3. Stressed long vowels diphthongise; for example, Ce & in ate 
and stay; Cou3 in go , soul , and below . The diphthongs appear 
as pure vowels in weakly stressed syllables. 

1+. Central vowels are Ca 3, which is close to C©3 except in 

tenseness and duration, and C^f3, as in b ird , turn , and murmu r. 

5. All unstressed vowels reduce to Ca3 or C£3 except Ce3 and C i □ 
before another vowel. These two vowels reduce to C}3. Ce3 
before Cl 3, Cm3, Cr3, and Cn3 reduces to syllabic Cl 3, Cm3, 

Cr3, and Cn3. 

6. Cr3 is always pronounced and never intrusive except sometimes 
in wash . 

7. Cl 3 is usually back except after high front voxels, and is 
often rounded after rounded vowels. 

8. C 1 3 is frequently lost when final. 

The Southern-General American border region is characterized more 
by stress and intonation patterns than by specific phonetic qualities, 
but some characteristics are evident: 

1. Retracted stress is common in words like cement and i nsurance . 

2. Words are frequently run together and forms like you* ns , you* zl , 
and y *all are common for you (plural), you will and you all , 
respectively. 

3. Ct3 goes to Cn3 always before nasals except in been and si nce , 
where the opposite happens . 

l+o C^T3, C&3, and C$3 are raised before all front consonants. 

Black Urban speech is characterized by voice quality as much as 
any other factor, but a few outstanding phonetic tendencies are indicated 

1. Word final stops are nearly always lost. 

2. C03 goes to C 1 3 and C63 to Cd3, particularly in pronouns and 
demonstratives . 
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3. 


Stressed vowels diphthongize, and the resulting diphthong 
sounds very like the first element. 


k. 


Voiced consonants are often substituted for unvoiced ones ; 
the reverse situation occurs equally often. 


5. 


Consonant clusters are simplified usually by deletion of the 
stop in syllable final CskD, CspD, and CstD clusters. 



The speech of New York City varies from borough to borough within 
the city. Some characteristics of the speaker from Long Island are 
listed here: 



1 . 


Coal appears whenever a low back rounded vowel is followed 
by CrD as in horse. 


2. 


Unrounded back vowels followed by CrD are lengthened as in 
New England speech , and the CrD is deleted. 


3. 


CaeD is in free variation with C£©D. 


k. 


In nasalization, the nasal consonants are absorbed by the 
preceding vowels . 


5. 


CrjqD occasionally alternates with CqD as in Long Island. 


6. 


[|] is back and palatalized, often with no contact between 
tongue and alveolar ridge. 



The variety of British speech used in the study has been somewhat 
Americanized, but still retains the "clipped" quality of British speech, 
and has a variety of low back vowels , most of which are not heard in 
General American speech. 



1. 


Unstressed vowels reduce to CI3. 


2. 


LaeD usually occurs in words like carry and parry; C£D occurs 
in monosyllabics with CrO. 


3. 


CaJ is the so-called "broad a" in bath, half, aunt, etc. 


h. 


CoD is somewhat higher than American CoD, suggesting CoD when 
followed by CrD, Cl 3, and Cw3. 


5. 


C3J occurs in words like bird, turn and murmur; final CrD 
goes to C©D. 


6 . 


C/O occurs intervocalically . 


T. 


Cl - ] is clear and frontal. 



The selection of the testing procedure presented the greatest 



problem. 


A test was desired which perceived the different dialectal 
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intonations , yet tested the intelligibility of specific words . The 
large number of listeners necessitated a test which could be easily 
scored. Tests in which the listeners write down their answers, 
whether sentences, words, or nonsense syllables, involve a degree of 
phonetic sophistication and judgment on the part of both participant 
and scorer, particularly if the experimenter is interested in what 
errors occur . 

Phonetically balanced word lists such as the Harvard PB Lists 
and various CVC word lists were unsuitable because intonation patterns 
ere lost when the speaker pronounces one word at a time. Fairbanks’ 

Rhyme Test and the Modified Rhyme Test, developed by House, et al., are 
multiple choice tests where the alterrative responses differ from the 
pronounced wore, by one phoneme. These tests, while eliminating the 
need for judgment in scoring, still present the problem of single word 
utterances which are inadequate for testing dialect intelligibility. 

A problem also arises because listeners only have a choice between four 
or five expected responses. 

The Cloye Procedure test used in Harms’ study presents a form to 
the listener on which a short narrative, heard previously, is printed 
with blanks replacing certain words. The subject is instructed to fill 
in the blanks with the exact word used by the speaker. This kind of test 
has listener comprehension as its main parameter, rather than auditory 
intelligibility . 

The test selected for the study was the Multiple Choice Intelligi- 
bility Test, developed by Haugen, Black, et al. (1963). These tests 
are constructed of twelve lists of twenty-four words each. There are 
four forms. A, B, C, and D, and four alternate response forms, A-l, B-l, 
C-l, and D-l. Words are separated into groups of three words with a 
carrier phrase, pronounced with no pause, as if it were an incomplete 
sentence. The carrier phrase is the number of the test item, with 
eight items per each of the twelve lists in one test. Thus, the first 
item would look like : 

Number 1 crook fair amble 

The answer sheet includes four possible responses for each of the 
three words and the listener is asked to consider each word and make 
the correct response. 

The methodology of the test, i.e. the fact that each item of 
seven or eight syllables is read as a phrase, preserves the intonation 
and assimilation tendencies of each dialect, yet provides an exact 
measure of word intelligibility. Each word in a particular utterance 
is scored separately; analyses of variance have shown little or not 
difference among the three scores (Black, 1958). Because a multiple 
choice format specifies possible responses, the importance of linguistic 
sophistication among t.ie listeners is reduced, and the study of 
confusion characteristics between the fixed population of words is 
made possible. The limitations which result from fixed responses 
are counterbalanced by the need for a test in which phonetic knowledge 
is not necessary. 

The twelve lists of each test contain -ifferent words, but are 
equivalent in difficulty. Equivalent but unlike lists are necessary 
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to prevent a learning factor from affecting the reliability of the 
test as a measure of intelligibility. Forms A and B are somewhat 
less difficult in that they yield higher mean scores than Forms C 
and D. Form A was chosen for this study because of the naivety of 
high school age listeners which were employed. 

Methodology 

Six speakers recorded two lists of twenty-four words using an 
Ampex Model 350 tape recorder at 7 1/2 i.p.s. One list was taken from 
Form A of Black’s Multiple Choice Intelligibility Tests; the second 
list was taken from the alternate response Form A-l. The lists of 
possible responses are identical for both forms; correct responses are 
different for each form. The lists were recorded in order that no 
speaker would read a list and its alternate (Speaker 6 was added after 
the recordings were finished, so that, in fact, he recorded two similar 
lists ) . 



SPEAKER 


DIALECT 


LIST NO. A 


LIST NO. 


1. C.B. 


Columbus , Ohio 


1 


2 


2. M.G. 


New York- Jewish 


2 


3 


3. B.H. 


Rural Ohio 


3 


k 


k. G.D. 


British 


k 


5 


5. J.H. 


Columbus, Ohio 


5 




6. C.D. 


Urban Black 


6 


6 



The recordings were played on a Tandberg Moael tape recorder to 
65 senior high school students from four church groups located in the 
north side of Columbus, Ohio. The recordings were played in small 
meeting rooms with normal "classroom quite,” with no noise in the 
signal. Listeners recorded their responses on standardized, printed 
answer sheets (Appendix 2), which had been duplicated by Multilith 
from the booklet "Multiple Choice Intelligibility Tests." Instructions 
for the listeners were adapted from the same booklet. The answer 
sheets were scored and checked by another scorer, and a frequency 
count of all litenener responses was done. Per cent counts were used 
to show how frequently each possible response was marked. Percentages 
were calculated by means of a simple Fortran program for an IBM 360 
computer. The table was based on 63 as 100$, which was the number of 
usable listener responses for each list. Mean scores and standard 
deviation were calculated by computer. 

Data analysis was performed on the basis of variance of mean 
intelligibility scores between dialects , using the Columbus speakers 
as controls, and assuming the mean scores of the control dialect to be 
100$ intelligible. Actual deviations from 100$ intelligibility were 
assumed to be functions of the testing procedure. 

Results 



The results of the experiment are shown 
The possible responses are shown on the left 
was given). The numbers are the percentages 



in Lists 1 through 6. 

(N.A. indicates no answer 
of listeners who indicated 
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each response. Correct responses are underlined. Since answer sheets 
for both Forms A and A-l are the same, two compilations are shwon on 
each list. The speaker who read each list is shown by initials at the 
top. A percentage conversion chart is shown in Appendix 1 indicating 
the percentage of 6 3 versus the number of listeners. 

Mean scores for each dialect are shown below — the average number 
correct out of 1+8. Scores are shown in order of most intelligible to 
least intelligible to listeners from Columbus, Ohio. 



Speaker J .H. 


- Columbus 


- 1+5.21+ 


Speaker C.B. 


- Columbus 


- 1+3.72 


Speaker G.D. 


- British 


- 1+2.13 


Speaker C.D. 


- Black 


- 39.83 


Speaker M.G. 


- New York 


- 39.03 


Speaker B.N. 


- Rural Ohio 


- 35.86 



Pages 1 ana 2 of each listener’s test form were separated for ease 
in scoring so mean scores and standard deviations were calculated for 
each speaker’s lists separately. In the table below two scores are 



each speaker; 
from Form B. 


the upper 


score is from 


the list 


SPEAKER LIST NUMBER 


MEAN 


S.D. 


J.H. 


5 


21.79 


2.06 




1 


23. kb 


0.86 


C.B. 


1 


20.78 


3.1+0 




2 


22.60 


2.20 


G.D. 


1+ 


25.65 


3.21+ 




5 


21.1+6 


6.95 


C.D. 


6 


20.33 


1.93 




6 


19-1+1+ 


1.82 


M.G. 


2 


19.05 


2.10 




3. 


19.97 


1.82 


B.N. 


3 


15.21+ 


3.1+9 




1+ 


18.97 


3.68 



It was noted that scores for the alternate response form A-l 
were slightly higher than those of form A. This was not predicted 
in the preparation of the test materials, and both forms were combined 
in the calculation of the overall mean scores. 

Since no test of significance for percentages in groups of four 
could be located, any deviation over 15# (10 listeners of 63 ) will 
be considered in the analysis. Since some of the words on the test 
are easily confused in standard testing situations, some of these 
differences will not be explainable in terms of dialect differences, 
but rather as perceptual confusions inherent in the words and their 
alternate responses . 

The first Columbus speaker, J.H., shows only a few instances 
where less than 85# of the listeners responded correctly. In all but 
three cases, the confusions are between stops, or between stops and 
_ between word , were ; plot , cloc k, b lot ; kind , pin e , t ime ; quit , 
quick; world , whi rl . 
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Trial was mistaken for trail 15.87$ of the time. The only 
plausible explanation for confusion between [aX] and [el] would be 
that the listeners were mistaken in orthography. The speaker 
pronounced trial very clearly and the experimenter can find no 
phonetic basis for the confusion. 

Kelieve was mistaken for relief 19.05$ of the time. [v] 
and [f] in final position are commonly confused, and since relief is 
the final word of the utterance, the drop in volume would augment this 
tendency . 

. Legion was mistaken for legend by 22.22$ of the listeners. In 
the test item, legend is followed by blunder , nearly obscuring the [d], 
if, indeed, it was pronounced at all. Legion-legend shows a tense-lax 
apposition which is confused in many Ohio pronunciations as in /mez / 
and /mez /. 

The errors indicated for the other Columbus speaker, C.B., are 
somewhat more complicated. Court was mistaken for quart nearly half 
the time. C.B.’s Cw3 in quart was unvoiced, and nearly imperceptible. 
Instead of a clear Ckw3 cluster, she produced a slightly labialized 
Ck w 3 , which was due to her own idiolect rather than any dialect 
characteristic. It is probable that Ck w 3 would be common in all dialects. 

An interesting error was that concerning the ford flicker , which 
was heard by only 58.73$ of the listeners. 15.87$ heard l iquor , easily 
explainable by the fact that fli cker is preceded by group ; [p] and [f] 
are quite similar and [f] might easily be mistaken for the aspiration 
of [p]. But 23.81$ of the listeners heard quicker . Even if it is 
assumed that the [p] creates confusion in the following word, there is 
no basis for explaining the perception of [k w ] where [fl] was produced. 

In the alternate response form of this item, when the speaker pronounced 
quicker , 100.00$ responded correctly. It can only be as sumed that 
flicker is a word with high confusion tendencies, because of the low 
intensity of the [fl] cluster. 

71*^3$ of the listeners heard rage correctly. The remaining 
listeners responded randomly among the other choices; four listeners did 
not respond at all. 

Anger was mistaken for anchor 23.81$ cf the time; as in Speaker 
J.II.’s lists, voicing is confused, a function of the test words rather 
than dialect. 

The last case of confusion in the utterances of the Columbus 
speakers is between confer and confirm . The word immediately following 
is vers e; those listeners who heard confirm must have overcompensated 
for voicing, inserting a labial consonant between [r] and [v]. 

Other errors of these types occur in the responses to speakers of 
the other dialects. These kinds of errors will not be analyzed as they 
are functions of the test, and not induced by dialect. However, it 
should be noted that a greater number of test-induced errors occurred 
in the other four dialects than in the Columbus dialect, thus suggesting 
that overall intelligibility is affected by dialect , but not in 
predictable dialect errors. 

One of the most outstanding features of the New York dialects is 
the distortion, or absence of [r] following a vowel. Many of the 
confusions shown in the lists of the New York speaker, M.G. (Lists 2 
and 3), occurred in words containing [r]. 
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Only 66.67% of the listeners responded correctly to horror . 

3.17$ heard father and borrow , respectively, and 26.9 8$ heard power . 

The production of horror showed a short Eol] instead of Col] and the 
Col] which nearly always replaces Er3 sounded very like a EwI] ; the semi- 
vowel was made necessary for the transition to the next syllable, 
which was E A D. In syllables ending with a vowel followed by a word 
or syllable beginning with a vowel, as horror is pronounced in New 
York, Erl] is often intrusive. However, dissimilating influences prevent 
the introduction of Erl] in this position. 1 EwU is quite a common 
replacement for Erl] in child language; thus it is predictable that 
listeners who are unfamiliar with a New York Cr3 would hear Cw3 . 

Speaker M.G.'s ErC-sounds tend to resemble Cvl] in all positions. 

This peculiarity is not to be considered a functionally defective Erl], 
since it is heard throughout this dialect area. It seems evident that 
the Erl] distortion creates confusion with other liquids, such as Ell] 
and EwI] , as occurred when the speaker pronounced grow . 1 .9b% heard 
glow , and 9-52# heard go with no liquid at all. 

When drift was pronounced, only 12.70$ of the listeners responded 
correctly. ]+9-21$ heard drip , which can be explained in a manner similar 
to the arguments presented for Speakers C.B. and J.H., but 38.10$ heard 
thrift . A EwC-like Erl] would have a longer voicing feature than a 
clear Erl] and a Ed3 with a weak onset might easily be mistaken for a 
ESI]. It is also common in this dialect for initial dental stops to be 
slightly affricated. 

The responses generated bypproduction of gull are nearly random, but 
explainable by the New York substitution of ECtJ for EaJ in stressed 
positions. Thus, 7^.60$ of the listeners heard the back vowel, 
responding with gall , gold , or g oal . 

Analysis of Speaker B.N.'s productions (Lists 3 and 1+) were made 
difficult by the high percentage of listeners who did not indicate any 
responses . 

In many cases nearly all listeners who responded did so correctly, 
but percentage scores in these cases are only between 60 $ and 80 $; as 
a result, it is impossible to guess what the listeners thought they 
heard; they could not decide themselves. Therefore, only those items 
with a significant number of wrong answers indicated will be looked at. 

Most of the errors in Speaker B.N. f s dialect are consonant 
confusions of manner; a few are errors in place of articulation. 

Speaker B.N. also exhibits diphthongization of vowels, common in the 
Southern speech area. This tendency has diffused throughout the 
Kentucky, West Virginia, and Southern Ohio area, creating what might 
be mistaken for a ’’Southern drawl.” It is probable that this is the 
cause of amny of the no answer responses . 

Two confusions are due to the backness of the Ell], which occurs 
in both the Columbus and the Southern Ohio dialects . 28 . 57$ of the 

listeners heard v irtual when virtue was pronounced. In both dialects, 
the two words sound nearly alike; unstressed syllabic Ell] often suggests 
Eul] or Eol] , and the two words are easily confused. In the second 
case, 11.11$ heard m eadow when mettle was pronounced. Medial EdB and 
E t □ are usually flapped, and the syllabic Ell] immediately following the 
flap is articulated so far in tie back of tlie mouth as to suggest Eol] . 
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Of the mistakes in manner of articulation, the most consistent 
is spear (3^*92^) for sphere . Sphere is one of only a few English 
words with an CsfD cluster and would probably be confused in any 
dialect . 

22.22% of the listeners mistook kernel for curdle , a nasal for 
its homorganic stop. hh.hk% heard burst for birch ; an CstD cluster 
for a prepalatal affricate, CtJl. The word immediately following is 
praise which could suggest a final stop rather than a fricative 
release . 

When shave was pronounced, nearly 27$ heard shade , a dissimilation 
from CfD in effect , the word following. Other confusions of this 
type occur as do mistakes in place of articulation; many more than 
occurred in the other tests. It is interesting that most of the 
listeners laughed when they heard the first few utterances of this 
speaker — perhaps an indication that they thought this dialect was 
very different from their own. 

One example illustrates the similarity between the vowels CeD and 
C13 in the two Ohio dialects. When ten was pronounced (after chain), 
only 1^.29$ responded dorrectly. The remaining answers were nearly 
random between pen, pin , tent , and N.A. Here the stop confusion is 
not dialect related, but in the alternate response list, pronounced 
by Speaker M.G. , nearly one-third of the listeners mistook pen for 
pin; since these two vowels merge in the Ohio dialects, the listeners 
would only differentiate them with careful listening, if at all. 

Final Ztl in a cluster is lost in Southern dialects. This is 
illustrated by this speaker where only 65 . 08 $ of the listerners heard 
plant . 

The British dialect, spoken by Speaker G.D. (Lists k and 5)» 
also shows a number of items with high percentages of N.A. responses, 
although this tendency was not consistent throughout the test. It was 
noticed that most listeners tended to have either a great deal of 
trouble, or little at all with this dialect. Relatively few scores 
are near the average, but at either end of the scale. 

Intervocalic CrD is flapped in this dialect as are CtD and CdD 
in American dialects, so when sto rage was pronounced, it suggested a 
medial CtD; 1J .k6%, of the listeners heard shortage . 

The consonants of this speaker which involve oral pressure at 
some level seem to be characterized by their firmness, e.g. the onset 
is somewhat stronger than normal, thus some confusions in voicing 
result, as between f olly and volley , smashing and matching . Other 
consonant confusions were mainly of manner ( revers e for revert ) , but 
few of the errors show percentages over 15$. The items where the 
correct responses were marked less than 85 $ of the time were usually 
the items with high percentages of no answer. The extremely clipped 
quality of this dialect produces only a few test induced assimilation 
errors. Since most of the errors were not consistent, little else can 
be said about dialectal influences on the test responses. 

Most of the errors indicated in Speaker C.D. f s Black dialect 
(List 6) are confusions of final consonants and clusters, although 
there are a few vowel-diphthong mistakes. In both lists for this 
speaker, p rod and pr oud was confused, although prou d was taken for 
prod more often than the reverse situation. The speaker diphthongizes 
all stressed vowels, and the resulting diphthong is typically similar 
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to the sound of its first element. Thus words like prod and proud are 
nearly undistinguishable in this dialect. The tendency is a residual 
quality of Southern Negro speech which is frequently heard even in 
the Northern urban areas of the country. 

Some consonant confusions occur which are not dialect derived, 
mostly initial stop confusions and voiced stop-spirant confusions, but 
wherever the final consonant is the crucial element, mistakes occurred. 
Black speakers tend to drop or obscure final consonants in general, 
also a residue of Southern Negro speech; thus errors occurred for: 
new, noon, nude; law , l og ; term , turn ; flat , flak ; print , prince ; 
wake , wait , wade ; blast , black ; ju mp , junk . 

Tint and tens e were confused, but besides the problem of final 
consonants, there is the merging of CJ3 and E^O which occurs also in 
the Ohio dialects . 

In urban dialects in general, E^3 often goes to CtD. Indications 
of this occurred in the test when 20 . 63 % of the listeners heard fateful 
when fai thful was pronounced. Confusion also occurred between suit 
and shoot , but this is not believed to have been caused by the dialect 
of the speaker, but rather by his tendency to distort EsD to a slight 
degree. 



Conclusio n 

i 

The most intelligible speakers to listeners of the dialect of 
Columbus, Ohio, are speakers of the Columbus dialect. Relative 
intelligibility varies with dialect; dialects arranged in order of most 
to least intelligible are: Columbus, British, Black, New York, and 
Rural Ohio. ! 

Unfortunately, only a few specific instances of dialect features 
are extractable from the mass of results for each list. Direct 
comparisons between lists are only possible for a list and its alternate 
response list. Some deviations occur in one list which do not occur 
in its alternate, suggesting differences between speaker-dialect 
intelligibility, but comparing successive lists is difficult because 
the test words are different. 

A serious problem arose in evaluating the data — that of the test- 
induced assimilation errors. Although the number of these errors 
varies from dialect to dialect, they tend to obscure the general 
results. It is ironic that the reason for which the test was chosen, 
the phraselike structure of the test items, was the reason that the 
data were so difficult to interpret. Scoring the tests is quite 
simple, but the process of extracting frequencies of all responses is 
very time-consuming, since it must be done by hand. 

Therefore, in the opinion of the experimenter, the usefulness of 
the test as a measure of dialect intelligibility is somewhat over- 
shadowed by the assimilation errors caused by the testing procedure. 
Although the results did yield predicted variations, some amount of 
judgment was necessary to determine which errors were test-induced and 
irrelevant to the purpose of the study. However, it is believed that 
the multiple-choice format is the most desirable for studies of this 
kind. The great number of N.A. responses indicates that a greater 
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number of blank spaces would occur in a write-down test for naive 

listeners because they simply would not know what to write down. j. 



Footnote 

•^ Horror is seldom pronounced correctly by speakers of any 
dialect. What is usually heard is /hor*/. 
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LIST #1 



r 

4 - 



*■« 

4 - 


RESPONSE 


C.3. 


J.H. 


RESPONSE 


C.3. 


■ J.H. 


J 

r 1 

3 


FORM 


00.00 


0.00 


GROUP 


98.41 


0.00 




WARM 


0.00 


100.00 


TROOP 


0.00 


0.00 




SWARM 


100.00 


0.00 


COUPE 


0.00 


0.00 


« — 


STORM 


0.00 


0.00 


FRUIT 


1.59 


100.00 


> 


N.A. 


0.00 


0.00 


N.A. 


0.00 


0.00 




C-uMPUS 


4.76 


98.41 


QUICKER 


23.81 


100.00 




CANVAS 


95.24 


1.59 


FLICKER 




0.00 




PAMPHLET 


0.00 


0.00 


SLICKER 


1.59 


0.00 




PANTHER 


0.00 


0.00 


LIQUOR 


15.87 


0.00 




N.A. 


0.00 


0.00 


N.A. 


0.00 


0.00 




COURT 


42.86 


4.76 


beef 


80.95 


0.00 


* - 


FORT 


0.00 


0.00 


3EAST 


T.76 


0.00 




PORT 


7.94 


95.24 


3EAT 


12.70 


0.00 


- 


QUART 


49.21 


0.00 


BEAM 


0.00 


100.00 


— 


N.A. 


0.00 


0.00 


N.A 


1.59 


0.00 




AIRFORCE 


1.59 


O.uO 


REASON 


1.59 


0.00 


- 


rt.IRPORT 


98.41 


0.00 


REGION 


7.94 


1.59 




AIRCORPS 


0.00 


98.41 


LEGION 


84*13 


22.22 




AIRBORNE 


0.00 


1.59 


LEGEND 


“£776 


76.19 




N.A. 


0.00 


0.00 


N.A. 


1.59 


0.00 




SPARK 


0.00 


0.00 


WONDER 


8 70 o 


0.00 




P/vRX 


3.17 


0.00 


BLUNDER 


3.17 


100.00 


" 


D«.RK 


3.17 


98.41 


THUNDER 


6.35 


0.00 




3a RK 


92.06 


1.59 


SPONSOR 


0.00 


0.00 




N.A. 


1.59 


0.00 


N.A. 


3.17 


0.00 




TASSEL 


98.41 


1.59 


CORN 


1.59 


0.00 




TaCXLS 


1.59 


0.00 


TORN 


0.00 


100.00 




cattle 


0.00 


0.00 


HORN 


96.8? 


0.00 




PASTEL 


0.00 


98.41 


BORN 


0.00 


0.00 




N • A • 


0.00 


0.00 


N.A. 


1.59 


0.00 
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B a y 



RESPONSE 


C.B. 


J.H. 


RESPONSE 


C.B. 


J.H. 


~T 

V 

tj 


STRETCH 


1.59 


0.00 


RAID 


6.35 


2 3-65 


{■ 


THREAT 


90 .48 


1.59 


RATE 


6.35 


6.35 


4L 


DREAD 


3.17 


98.41 


RANGE 


9.52 


0.00 


t- 

L 

£ 


BREAD 


0.00 


0.00 


RAGE 


71.4 ? 


0.00 




N.A. 


4.76 


0.00 


N.A. 


6^35 


0.00 


- 


HEAR 


0.00 


0.00 


FITTING 


0.00 


100 0 00 




STEER 


1.59 


0.00 


PRETTY 


0.00 


0.00 




NEAR 


0.00 


100.00 


CITY 


^6.8? 


0.00 




DEER 


98.41 


0.00 


SITTING 


0.00 


0.00 




N.A. 


0.00 


0.00 


N.A. 


3.17 


0.00 




GUARD 


1.59 


0.00 


OtfL 


1.59 


0.00 


i 


HEARTEN 


1.59 


96. 8 2 


CALL 


0.00 


0.00 




GARDEN 


96.82 


1.59 


HALL 


7.94 


98.41 




BARGAIN 


0.00 


0.00 


ALL 


85.71 


1.59 




N.A. 


0.00 


1.59 


N.A. 


4.7^ 


0.00 


- 


CURTAIN 


85-71 


1.59 


UNCLE 


6.35 


0.00 




PERTAIN 


0.00 


0.00 


BUCKLE 


1.59 


1.59 




PERSON 


1.59 


0.00 


KNUCKLE 


90.48 


98.41 




CERTAIN 


11.11 


98.41 


STUCCO 


0.00 


0.00 




N.A. 


1.59 


0.00 


N.A. 


1.59 


0.00 




EXPORT 


87.30 


0.00 


DREAD 


0.00 


0.00 




EXTORT 


0.00 


98.41 


DRESS 


96.82 


1.59 




EXPERT 


6.35 


0.00 


REST 


3.17 


98.41 




ESCORT 


1.59 


0.00 


RED 


0.00 


0.00 


- 


N.A. 


4.76 


1.59 


N.A. 


0.00 


0.00 




FILE 


0.00 


98.41 


SCREECH 


84.12 


0.00 




PANEL 


0.00 


0.00 


PREACH 


3.17 


0.00 




FUNNEL 


1.59 


0.00 


REACH 


3.17 


0.00 


_ 


FINAL 


95.24 


1.59 


STREET 


7.94 


100.00 




N.A. 


3.17 


0.00 


N.A. 


1.59 


0.00 
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RESPONSE' 


M.G. 


C.8. 


RESPONSE 


M.G. 


C.B. 


SKID 


100.00 


12.70 


HEART 


76.19 


98.41 


SKIN 


0.00 


0.00 


BARGE- 


0.00 


0.00 


HID 


0.00 


85-71 


LARD 


0.00 


0.00 


HIT 


0.00 


1.59 


HARD 


25.40 


1.59 


N.A. 


0.00 


0.00 


N.A. 


1.59 


0.00 


MOVE 


68.25 


3.17 


FASTEN 


85.71 


1.59 


MOOD 


33.32 


1.59 


PASSION 


3.17 


3 >17 


FOOD 


0.00 


92.06 


FASHION 


7.9^ 


G .00 


SMOOTH 


0.00 


0.00 


PASSING 


1.59 


95.24 


N.A. 


0.00 


3-17 


N.A. 


3.17 


O.CO 


SWIM 


0.00 


1.59 


ANGLE 


1.59 


0.00 


TWIN 


0.00 


?5-24 


AMBER 


1.59 


0.00 


swift 


0.00 


0.00 


ANGER 




23. 81 


TWIST 


100.00 


1.59 


ANCHOR 


3.17 


76.19 


N.A, 


0.00 


1.59 


N.A. 


1.59 


0.00 


proclaim 


12.70 


0.00 


YOKE 


1.59 


96.83 


DOMAIN 


0.00 


100.00 


JOKE 


98.41 


3.17 


COCAINE 


0.00 


0.00 


CHOKE 


1.59 


0.00 


profane 


88.89 


0.00 


DOPE 


0.00 


0.00 


N.A. 


0.00 


0.00 


N.A. 


0.00 


0.00 


SPIN 


7-94 


0.00 


CHaT 


3.17 


96.83 


PIN 


6.35 


96.83 


CHAP 


6.35 


1.59 


THIN 


69.84 


1.59 


SHACK 


28.57 


0.00 


FIN 


157%7 


1.59 


SHAFT 


63.49 


1.59 


N.A. 


1.59 


0.00 


N.A. 


0.00 


0.00 


REPEAT 


0.00 


1.59 


HEADING 


0.00 


0.00 


RECEIVE 


95.24 


0.00 


SITTING 


0.00 


96.83 


RECEDE 


6.35 


0.00 


KNITTING 


100.00 


1.59 


REPRIEVE 


0.00 


96.83 


FITTING 


0.00 


0.00 


N.A. 


0.00 


1.59 


N.A. 


0.00 


1.59 
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RESPONSE 


M.G. 


C.B. 


RESPONSE 


M.G. 


C . B . 


COURT 


4.76 


7.9^ 


PIPE 


74.60 


3.17 


CORD 


0.00 


92.06 


PIKE 


25.90 


0.00 


HORSE 


0.00 


0.00 


TYPE 


0.00 


95.24 


COURSE 


95*24 


0.00 


TIGHT 


0.00 


• 1.59 


N.A. 


0.00 


0.00 


N.A. 


0.00 


0.00 


BALANCE 


92.06 


1.59 


BEAST 


°2.06 


0.00 


BALLOT 




?6.8^ 


BEAT 


3.17 


98.41 


GALLONS 


0.00 


0.00 


MEAT 


0.00 


1.59 


VALID 


4.76 


0.00 


LEAST 


4.76 


0.00 


N.A* 


0.00 


1.59 


N.A. 


0.00 


0.00 


DRANK 


11.11 


1.59 


DRAY 


0.00 


0.00 


RANK 


88.89 


0.00 


GREY 


1.59 


1.59 


RANCH 


o.oo 


0.00 


SPRAY 


96.83 


0.00 


DRAG 


0.00 


?6.83 


PRAY 


1.59 


98.41 


N .A . 


0.00 


1.59 


N.A. 


0.00 


0.00 



BANKING 


0.00 


0.00 


THRIFT 


38.10 


1-59 


FLANKING 


3.17 


98.41 


DRIP 


49.21 


0.00 


LANKY 


96.83 


0.00 


DRIFT 


12.70 


0.00 


BLANKET 


0.00 


0.00 


GRIP 


0.00 


93.41 


N.A. 


0.00 


1.59 


N.A. 


0.00 


c.co 


BORROW 


3.17 


93-^ 


CONFIRM 


19.05 


19.05 


HORROR 


66.67 


1.59 


CONFER 


1.59 


80.95 


FATHER 


3.17 


0.00 


CONSERVE 


20.63 


0.00 


POWER 


26.98 


3.17 


CONCERN 


57.14 


0.00 


N.A. 


0.00 


1.59 


N.A. 


1.59 


0.00 


UNFOLD 


88.89 


3.17 


VERSE 


7.94 


92.06 


UNTOLD 


6.35 


3.17 


FIRST 


87.30 




CONTROLLED 


0.00 


0.00 


BURST 


4.7^ 


3.17 


UPHOLD 


4.76 


92.06 


HURT 


0.00 


0,00 


N.A. 


0.00 


3.17 


N.A. 


0.00 


0.00 



LIST #3 



183 



! 

I 

,r 



RESPONSE 


3.N. 


M.G. 


RESFCNSE 


d k 


\ • * 

.4 • U • 


DEED 


0.00 


1.59 


DIMMER 


6.35 


0.00 


A LED 


3*17 


0.00 


DINNER 


84.13 


0.00 


SEED 


6.35 


9^.24 


THINNER 


1.39 


78.41 


FEED 


£7.84 


0.00 


TINNER 


C.00 


1.39 


N.A • 


20. '63 


3.17 


N.A. 


7.94 


1.39 


PROTRUDE 


1.59 


3.17 


ENVY 




o.oc 


CONCLUDE 


73.02 


3.17 


EMPTY 


1.39 


o.co 


CONSTRUED 


1.59 


92.06 


ENTRY 


20.63 


ICO. 00 


INCLUDE 


4.76 


0.00 


ENDING 


0.00 


0.00 


N.A. 


17.C3 


1.39 


N.a. 


23.81 


0.00 


TRAIN 


68.2* 


1.39 


RUMOR 


6^.84 


12.70 


CRANE 


n.52 


1.39 


ROaMER 


12.70 


83.71 


STRaIe 


1.59 


7.94 


RU33ER 


0.00 


0.00 


TERR' Ml 


3.17 


87.30 


ROVER 


1.39 


1.59 


N.A. 


17.46 


1.39 


N.a. 


13.87 


0.00 



VIRTUAL 


28.37 


96.83 


SPHERE 


33.36 


0.00 


CURE EV 


8.1? 


0.00 


FEAR 


3.1? 


1.39 


VIRTUE 


61.90 


3.17 


SFEAR 


34.92 


0.00 


VIRGIN 


0.00 


0.00 


BEER 


0.00 


96083 


N.a. 


?. 94 


0.00 


N .A . 


6.33 


1.39 


hide 


3-17 


0.00 


GULL 


7.94 


25.40 


FIVE 


0.00 


0.00 


GaLL 


6.33 


12.70 


HIRE 


30.93 


0.00 


GOLD 


63.49 


39.68 


FIRE 


1 o' 


100.00 


GOAL 


13.87 


30.16 


N.a. 


12.70 


o.oc 


N.a. 


6.33 


0.00 


StACK 


3.--7 


0.00 


PETaL 


4.76 


0.00 


F-.ici: 


8^71 


0.00 


METTLE 


77 . 7s 


1.59 


CaTCIJ 


4.?6 


3.17 


i'iEaDON 


11.11 


0.00 


CaT 


0.00 


96.83 


SETTLE 


0.00 


96.41 


N 


-.33 


0.00 


N.a. 


6.33 


6.00 
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RE5FCN3E 


3.N. 


io. 4 G • 


RESPONSE 


3 • N • 


* : » G . 


Fault 


?4.60 


7 . 9 ^ 


GLOW 


6.35 


7.94 


VAULT 


12.70 


8^.71 


GO 


90.48 


9.52 


DOG 


0.00 


0.00 


GROW 


0.00 




FOG 


0*00 


1.59 


GOAT 


0.00 


0.00 


M.A. 


12.70 


4.76 


N.a. 


3.17 


0.00 


3UR3T 


44.44 


7*1.60 


LATE 


3.17 


100 .00 


HURT 


1.52 


3.17 


LaDSM 


1.59 


0.00 


FIRST 


6.35 


12.70 


LAZY 


0.00 


0.00 


BIRCH 


23.81 


4.76 


LaDY 


02.06 


0.00 


N.A. 


15 TW 


1.59 


N.A. 


3.17 


0.00 


I Ra iJtM 


3 . 1 ? 


4.76 


BREAK 


80.95 


66.67 


TRACE 


6.35 


9,5.24 


RaKE 


7.94 


14.29 


PRaIu 


7}M 


0.00 


GREAT 


3.17 


T%B7 


PRaY 


4.76 


0.00 


grape 


3.17 


3.17 


Li « A . 


14.29 


0.00 


N.A 


4.76 


0.00 



3 LACK 


3.17 


1.59 


CHaNGE 


34.92 


9.52 


r.uci: 


0.00 


98.41 


chain 


50.97 


49.21 


SUCK 


90,48 


0.00 


STAIN 


1.59 


1.59 


FLaK 


ll59 


0.00 


•3 HAHE 


1.59 


39.68 


N.a. 


4.76 


0.00 


N.A. 


12.70 


0.00 


KERNEL 


22.22 


0.00 


PEN 


26.98 


30.16 


curdle 


MT90 


6.35 


PIN 


17.46 


66.67 


TURTLE 


11.11 


1.59 


TENT 


25.40 


1.59 


HURDLf 


0.00 


92.06 


TEN 


14.29 


1.59 


xt 

i\! • rx # 


4.76 


0.00 


N.A. 


15^7 


0.00 


GRaFT 


0.00 


3.17 


HARD 


12.70 


0.00 


DRaFT 


6.35 


68.25 


fart 


17.46 


0.00 


DRaB 


63.49 


2ST57 


HARSH 


3.17 


98.41 


GRAB 


26.93 


0.00 


HEART 


53.9? 


0 .00 


N.A. 


3.17 


0.00 


N.A. 


12.70 


0.00 
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RESPONSE 


G«D# 


B.N. 


RESPONSE 


S.D, 


B.N. 


STARDOM 


3.17 


0.00 


EIGHT 


93.65 


0.00 


pardon 


84.1? 


1.59 


ACHE 


3.17 


1.59 


GARDEN 


0.00 


?8.4l 


HATE 


0.00 


96.83. 


AUTUMN 


1.59 


O.OC 


BAKE 


0.00 


0.00 


N.a. ^ 


ii.n 


0.00 


N.A. 


3.17 


1.59 


Call 


1.59 


0.00 


REVOLVE 


0.00 


7.94 


BALL 


6.35 


?6.8J 


INVOLVE 


0.00 


0.00 


HALL 


7?-?7 


0.00 


RESOLVE 


1.59 


88.89 


SMALL 


1.59 


0.00 


DISSOLVE 


2^24 


0.06 


N.A. 


12.70 


3.17 


N.A. 


3.17 


3.17 


3UBRLL 


7.94 


0.00 


NEEDLE- 


95.24 


3.17 


STUBBLE 


1.59 




FETAL 


0.00 


3.17 


TROUBLE 


4.76 


1.59 


EAGLE 


1.59 


0.00 


DOUB -Ei 


76.19 


3.17 


BEETLE 


0.00 


88.89 


N.A. 


9.52 


1.59 


N.A. 


3.17 


4.7^ 



TOP 


38. 39 


3.17 


ABLE 


0.00 


0.00 


HOP 


0.00 


0.00 


STABLE 


0.00 


0.00 


POP 


7.94 


9.52 


FABLE 


93.65 


1.59 


PROP 


1.59 


87.30 


TABLE 


lo 59 


92.06 


N.A. 


1.59 


0.00 


N.A. 


4.76 


4.76 


TOOL 


1.59 


88.89 


RECLINE 


88.89 


9.52 


CRUEL 


92.06 


-5T35 


REFINE 


4.76 


6.35 


DROOL 


1.59 


1.59 


RECLAIM 


3.1? 


4.76 


COOL 


1.59 


0.00 


REPLY 


C.00 


' 73 - 02 


N.a. 


3.17 


3.17 


N.A. 


3.17 


6755 


STORAGE 


76.19 


6.35 


FOLLY 


12.70 


73.02 


PORRIDGE 


o'.oo 


87-29 


VOLLEY 


82.54 


19.05 


SHORTAGE 


17.46 


4.76 


POLISH 


0.00 


0.00 


STORY 


3.17 


0.00 


TROLLEY 


o.co 


0.00 


N.a. 


3.17 


1.59 


N.A. 


4.76 


4.76 
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response; 


G.D. 


3.N. 


GAVE 


0.00 


0.00 


SHADE 


92.06 


26.98 


FADE 


3.17 


0.00 


SHAVE 


1.59 


68.25 


N.A. 


1.59 


t.76 


EFFECT 


9.52 


77.78 


EXPECT 


C.00 


OcCO 


INSPECT 


3.17 


1.59 


INFECT 


84.13 


9.52 


H.a. 


3.17’ 


11.11 


hard 


1.59 


84.13 


CARD 


92.06 


4.76 


CORD 


1.59 


1.59 


HARSH 


0.00 


1.59 


N.a. 


4.76 


7.9^ 


STRANGE 


19.0 5 


0.00 


8RING 


11.11 


0.00 


RAIN 


3.17 


88.89 


BRAIN 


38.73 


1.59 


N.A. 


9.52 


9.52 


WAD 


1.59 


77-78 


'WASH 


1.59 


47?^ 


SQUAD 


79.37 


4.76 


5 QUAD U 


9.52 


1.59 


N.A. 


7.94 


11.11 


PLANT 


3.17 


0.00 


CLAMP 


4.76 


4.76 


CRA,-ir 


15.87 


83.71 


TRaMP 


69.84 


0.00 


N.A. 


~ ST35 


9.52 



RESPONSE 


G.D. 


n >T 


CLAD 


3.17 


3.17 


CLAN 


9.52 


6.35 


PUN 


79-27 


12.70 


PUNT 


1.59 


65.08 


N.A. 


4.76 


12.70 


LIFT 


88.89 


31.75 


RIFT 


3.17 


14.29 


DRIFT 


3 >17 


12,70 


LIST 

N.A. 


1.59 

3.17 


23.81 


BEHAVE 


1.59 


0.00 


WITHHOLD 


6.35 


9.52 


REVOLT 


0.00 


73.02 


BEHOLD 


88.89 


3.17 


N.A. 


3.17 


14.29 


QUARRY 


0.00 


9.52 


GLORY 


92.06 


33.33 


GORY 


3.17 


33.97 


SORRY 


0.00 


0.00 


N.A. 


0.00 


4.76 


SUCH 


1.59 


73.02 


TOUCH 


1.59 


7.95 


NUT 


96.83 


1.59 


BUTT 


0.00 


6.35 


N.a. 


0.00 


11.11 


FORCE 


100.00 


7.94 


FOURTH 


0.00 


6.35 


COURSE 


0.00 


3.17 


HORSE 


0.00 


76.19 


N.A. 


0.00 
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RESFONSE 


J • H. 


G.D. 


RESPONSE 


J.H. 


G .D . 


COCK 


12.70 


1.59 


TOOK 


0.00 


90.48 


CROCK 


87.30 


0.00 


SHOOK 


22& 


0.00 


VR-'nK 


0.00 


98.41 


SHOCK 


6.35 


1.59 


■ICC K 


0.00 


0.00 


COCK 


0.00 


0.00 


N.A. 


0.00 


0.00 


N.a. 


0.00 


7.94 


fair 


?5.24 


1.59 


OPE J 


0.00 


1.59 


BaR hi 


3.17 


6.35 


030E 


4.76 


65.08 


CARE 


0.00 


1.5* 


OPaL 


93.65 


20.63 


pnl:i 


0.00 


90.48 


OVAL 


1.59 


3.17 


i I • A • 


0.00 


0.00 


N.a . 


0.00 


9.52 




0.00 


0.00 


TRIAL 


15.87 


4.76 


iviPLr. 


11.11 


1.59 


FILE 


0.00 


0.00 


AE3LL 


87.30 


1.59 


FiialL 


3.17 


77.78 


aPPLE 


1.5/ 


96^1 


TRAIL 


82^ 


7.94 


N.A. 


0.00 


0.00 


N.a. 


0.00 


7. 94 



BRINK 


12.70 


87.30 


FLAKE 


100.00 


1.59 


3RIDGE 


1.59 


0.00 


BLaHE 


0.00" 


1.59 


BRISK 


0.00 


1.59 


CLnls; 


0.00 


95.24 


^rick 


84.11 


4.76 


F Lri. NE 


0.00 


0.00 


N .A . 


1.59 


4.76 


N.A. 


0.00 


1.59 


SKIM 


0 .00 


88.89 


NORM 


7.94 


3.1? 


T/EN 


0.00 


3.17 


WORK 


0.00 


1.5/ 


VL-i 


0.00 


0.00 


WORD 


9.52 


92.06 


DIN 


96.41 


0.00 


WRV.E 


7.9,37 


3.17 


N.A. 


1.59 


9.52 


N.a. 


3.17 


0.00 


ACTION 


0.00 


0.00 


RELIEVE 


19.05 


71.43 


x-iaTC ting 


95.24 


6.35 


RECEIVE 


0.00 


0.00 


MAG IC 


3.17 


3.1? 


RELIEF 


79.37 


28.57 


SMASHING 


0.00 


80.95 


RELEASE 


1.59 


0.00 


N.a. 


1.5-9 


9.52 


N .A . 


0.00 


0.00 
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liESPCK'J.Sii 




G.D. 


RESPONSE 


J.H. 


G.D. 


CLOCK 


4.76 


100.00 


WORLD 


gP.Qg. 


6.35 


3 LOCK 


1.59 


0.00 


WHIRL 


39^68 


1.59 


PLOT 


84.13 


0.00 


WOOL 


0.00 


6.35 


BLOT 


TT/t 


0.00 


WOULD 


0.00 


84.13 


N.A* 


4.76 


0.00 


N.A. 


0.00 


1.59 


KIND 


80.95 


0.00 


HAPFY 


0.00 


0.00 


PINE 


9.52 


0.00 


HANDY 


100.00 


0.00 


FINE 


1.59 


100.00 


CANDY 


0.00 


96.83 


TIKE 


4.76 


0.00 


ENVY 


0.00 


1.59 


N.A. 


3.1 7 


0.00 


N.A. 


0.00 


1.59 


LEaP-l : G 


0.00 


1.59 


DODGE 


0.00 


96,82 


SLEEPING 


98.41 


0.00 


DARK 


3.17 


0.00 


CREEPING 


0.00 


0.00 


DOT 


?0.48 


3.17 


ttEAPI TG 


0.00 


98.41 


DOCK 


4.76 


0.00 


N • a • 


1.59 


0.00 


i'J* A • 


1.59 


0.00 



EIGHTY 


98.41 


1.59 


CONSCRIPT 


0.00 


3.17 


AC RING 


0.00" 


0.00 


CONFLICT 


0.00 


0.00 


DAINTY 


0.00 


87.30 


ASSIST 


0.00 


95.24 


3A3Y 


1.59 


3.17 


UNFIT 


98.41 


0.06 


N.A. 


0.00 


7.94 


N.A. 


1.59 


1.59 


PROOF 


0.00 


37.30 


REFER 


0.00 


1.59 


HOOP 


0.00 


4.76 


REHEARSE 


6.35 


3.17 


GROUP 


0.00 


0.00 


REVERSE 


93.65 


22.22 


SWOOP 


100.00 


0.00 


REVEST 


0.00 


71.43 


N.A. 


0.00 


7.94 


N.A. 


0.00 


1.59 


WHIP 


0.00 


0.00 


BUDGET 


98.41 


0.00 


QUIT 


84.13 


0.00 


BUCKET 


1.59 


98.41 


QUICK 


15.67 


1.59 


BUNION 


0.00 


6.00 


TWIST 


0.00 


93.65 


BUDGE 


0.00 


0.00 


N.A. 


0.00 


4.76 


N • 


0,00 


1.59 
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REHEARSE 


C.D. 


C.D. 


REHEARSE 


C.D. 


C.D. 


Sw'JIRM 


0.00 


0.00 


NEGLECT 


0.00 


O.CO 


:r lRi'i 


0.00 


0.00 


DEPXECT 


?5-24 


0. JO 


TERM 


88.89 


38.10 


REFLECT 




98.41 


TURN 


11.il 


61.90 


REFLEX 


1.39 


1.59 


N.A. 


0.00 


0.00 


N.a. 


0.00 


0.00 


hate 


96.83 


1.59 


LOST 


0.00 


0.00 


HASTE 


0.00 


0.00 


LONG 


1.39 


0.00 


EIGHT 


3.17 


0.00 


LOG 


28.57 


37.14 


TAKE 


0.00 


98.41 


La« 


60.32 


42.86 


n.a. 


0.00 


0.00 


N.A. 


9.52 


0.00 


COMHi T 


98.41 


12.70 


R033ER 


0.00 


98.41 


SUBMIT 


0.00 


0.00 


JOBBER 


93-63. 


1.5° 


PERMIT 


0.00 


1.59 


HARBOR 


3.17 


0.00 


COMMENCE 


1.59 


85.71 


SHOPPER 


3.17 


0.00 


N.A . 


0.00 


0.00 


N.A. 


0.00 


0.00 



CLOUD 


0.00 


0.00 


HELD 


0.00 


r* } 1 

?5*a4 


CROWD 


7.94 


3.17 


BELL 


3.17 


1.59 


PROUD 


80.93 


39.68 


FELL 


9.52 


3.97 


PROD 


11.11 


37.14 


TELL 


85.71 


0.00 


N.A. 


0.00 


0.00 


N.A. 


1.39 


0.00 


IVaIST 


76.83 


3.17 


INVITE 


88.89 


3.17 


Wa K E 


0.00 


30.79 


INSIGHT 


“5733 


0.00 


WADE 


3.17 


25.40 


INSIDE 


0.00 


6. 35 


WAIT 


0.00 


19.05 


aDVICE 


1.39 


87.30 


N.a. 


0.00 


1.59 


N.a. 


3.17 


3.17 


FEELING 


0.00 


6.35 


BUST 


0.00 


68.2^ 


MEETING 


7.94 


4.76 


FLaT 


73.02 


^35 


FEEDING 


0.00 


87.30 


FUK 


23. 81 


6.35 


I-1EANI JG 


12.06 


1.59 


3LACK 


1.39 


15.87 


N.A. 


0.00 


0.00 


N" • ii • 


1.5v 


3.17 
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RESPONSE 


C.D. 


C.D. 


RESPONSE 


C.D. 


C.D. 


PLAYFUL 


0.00 


100.00 


EGG 


3.17 


100.00 


Frt.IT '-FlJL 


79.37 


0.00 


EDGE 


?^.24 


0.00 


FATEFUL 


20.63 


0.00 


HEDGE 


1.59 


0.00 


BASEBALL 


0.00 


0.00 


HEAD 


0.00 


0.00 


N * A « 


0.00 


0.00 


N.A. 


0.00 


0.00 


SUIT 


77.78 


0.00 


FINDING 


0.00 


0.00 


SHOOT 


£ 2.26 


0.00 


BINDING 


100.00 


95.24 


300T 


0.00 


1.59 


BLINDING 


0.00 


4.76 


FKUIT 


0.00 


98.41 


UNDING 


0.00 


0.00 


N.A. 


0.00 


0.00 


N.A. 


0.00 


0.00 


DEPEND 


0.00 


1.59 


TINT 


0.00 


00 
r — 

• 


DETAIN 


15.87 


96.83 


PRINT 


28.57 


1.59 


3 i'jCaiA L 


82.54 


0.00 


PRINCE 


6?.84 


0.00 


ret*ik 


1.59 


1.59 


TENSE 


1.59 


14.29 


N.A. 


0.00 


0.00 


N.A. 


0.00 


C.00 


PLUiU:. 


0.00 


0.00 


DESK 


95.24 


3.17 


NSUTFUL 


0.00 


0.00 


DECK 


0.00 


95.24 


RURaL 


80.95 


4.76 


DEATH 


3.17 


0.00 


RULER 


19.05 


95.24 


DEBT 


0.00 


1.59 


N.a. 


0.00 


0.00 


N.A. 


1.59 


0.00 


NOUN 


4.76 


0.00 


BOTH 


1.59 


0.00 


NEW 


36.51 


61.90 


BOaT 


34.92 


100.00 


NUDE 


15.81 


38.10 


VOTE 


63.49 


0.00 


NOON 


42.86 


0.00 


QUOTE 


0.00 


0.00 


N .A • 


0.00 


0.00 


N.A. 


0.00 


0.00 


BHkVE 


1.5 * 


0.00 


YAWN 


0.00 


0 .00 


STAVE 


6.35 


92.06 


JUMP 


0.00 


82.54 


BaTHE 


1.50 


1*59 


JUNK 


0.00 


yTm. 


SAVE 


90.48 


6.35 


YOUNG 


100.00 


0.00 


N.A. 


0.00 


0.00 


N.A. 


0.00 


0.00 
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APPENDIX 1 

EASED ON 63 LISTENERS 



NUMBER WRONG 


PERCENTAGE 


NUMBER WRONG 


PERCENTAGE 


1 


1.59 


< 33 


52.38 


2 


3.17 


34 


53.97 


3 


4.76 


35 


55.56 


4 


6.35 


36 


57.1** 


5 


7.94 


37 


58.73 


6 


9.52 


38 


60.32 


' 7 


11.11 


39 


61.90 


8 


12.70 


40 


63.49 


9 


14.29 


41 


65.08 


10 


15.87 


42 


66.67 


11 


17.46 


43 


68.25 


12 


19.05 


44 


69.84 


13 


20.63 


45 


71.43 


14 


22.22 


46 


73.02 


15 


23.51 


47 


74.60 


16 


25.40 


48 


76.19 


17 


26.98 


49 


77.78 


18 


28.57 


50 


79.3? 


19 


30.16 


51 


80.95 


20 


31.75 


52 


82.54 


21 


3303 


53 


84.13 


22 


34.92 


54 


85.71 


23 


36.51 


55 


87.30 


24 


38.10 


56 


88,89 


25 


39.68 


57 


90.48 


26 


41.27 


58 


92.06 


27 


42.86 


59 


93.65 


28 


44.44 


60 


95.24 


29 


46.03 


61 


96.83 


30 


47.62 


62 


98.41 


31 


49.21 


63 


100.00 


32 


50.79 
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Intensity and Duration Analysis of Hungarian Secondary Stress 

Richard Gregorski, Ohio State University 
Andrew Kerek, Miami (Ohio) University 




It is generally agreed that in Hungarian, primary stress always 
falls on the first syllable of a word. Fonagy (1966) found no 
consistent acoustic correlate to this stress, but did find a correspondence 
between the activity of the internal intercostal muscles andr stress. 
However, Magdics* study (1969) seems to indicate that stressed vowels 
are generally more intense, longer, and higher in pitch than their un- 
stressed counterparts. 

The status of secondary stress — both its placement and rhythmic 
function — has been much disputed (Rakos, 1966). There are two main 
proposals regarding the placement of secondary stress: position and 
syllable-length theories. 1 Kerek (in press) attempts to resolve the 
issue by offering an alternative which accounts for secondary stress 
placement in terms of context, that is, "on the basis of the speaker's 
(subconscious) anticipation of the stress conditions in the immediately 
following context . " Closely connected with thi s theory are certain 
constraints related to syllable length and unstressed syllable sequences . 
Despite the general interest in Hungarian secondary stress, there 
exists, to our knowledge, no experimental research into either its 
acoustic or physiological basis. It was the purpose of this study to 
determine to what degree intensity and duration function as acoustic 
correlates of this secondary stress. 

It was assumed that the appearance of secondary stress on a vowel 
in terms of intensity and duration would manifest Itself as an increase 
of these parameters over the vowel’s unstressed counterpart, and not 
necessarily as absolute intensity or duration prominences over adjacent 
syllables. This is consistent with the view that stress is correlated 
with effort of production, i.e., that both stress production and 
perception involve a knowledge of the intrinsic physical parameters 
of a syllable and the consequent adjustment of effort needed to mark 
the presence of stress. Also important in stress analysis is the 
magnitude of the increase, for it is doubtful that a non-perceivable 
increment can have any functional significance. It was decided that 
the general perceptual threshold of ±1 dB for intensity and 10-U0 
msec, for duration (Lehiste, 1970) would serve as a fair indicator of 
the potential perceptual significance of intensity and duration increases. 

The following set of sentences was chosen for the experiment 
(' - primary stress; ~ - secondary stress): 

1. A. [f£Jtet:e:k p£tit] "They painted Pete." 

B. [f£Jtet:6:k p£tit] "They painted Pete." 

2. A. [ f £ Jtfct : e: t£k p^t It] "You (pi.) painted Pete." 

B. [ f£ Jt&t :e: tek p£tit] "You (pi.) painted Pete." 
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2. C. 

3. A. 

B. 

C. 

h. A. 
B. 

5. A. 
B. 



[f i jt£t :e: t€t it ] ’’You (pi.) painted Pete.” 

[ f i Jt£get : e: t£k p£tit] "You (pi.) kept painting Pete.” 

[ f ^Jteget :e: tek p£tit] "You (pi.) kept painting Pete. 

[ f^Jttget :e: tek p£tit] "You (pi.) kept painting Pete. 

[fi Jtegethet :e:tek p£tit] "You ~(pl.) may have kept 

painting Pete." 

[ f oj tegethet : e: tek p£tit] "You (pi.) may have kent 

painting Pete." 

[f i Jtegethet :e: tek if p£tit] "You (pi.) may have also 

kept painting Pete.” 

[f£ Jtegethet :e:tek ? j p£tit] "You (pi.) may have also 

kept painting Pete.” 



These sentences were chosen for the following reasons: (l) the numerous 

voiceless fricatives and plosives would facilitate segmentation; 

(2) for the most part, the vowel qualities could be kept constant 
throughout the expanding sequences; and (3) a variety of secondary stress 
placements could be employed. 

The subject (AK) , a trained linguist, is a native of Budapest, 
Hungary, who has lived in the United States since 1957. He constructed 
the test sentences , which exhibited possible secondary stress patterns 
in his dialect. He was presented with a randomized list consisting of 
ten occurrences of each of the sentence patterns (except 2.C. and 3.C.) 
and was asked to produce the sentences at his normal rate of speech. He 
was then instructed to produce 2.C. and 3.C. (the alternate secondary 
stress assignments for 2.B. and 3.B. respectively) ten times each. This 
procedure was followed since a randomization of 2.C. and 3.C. within the 
first list might have introduced an uncontrolled variable into the 
experiment, that is, the subject could have inadvertently substituted 
2.C. for 2.B. and 3.C. for 3.B. or vice versa. He then repeated the 
first list and the alternate patterns. Two additional similar sessions 
followed at intervals of about a week, at the end of which about 60 
productions of each pattern or approximately 720 utterances for the total 
set had been recorded. 

The recorded utterances were processed by a Fr^kjaer-Jensen 
intensity meter and pitch meter, the output of which was converted by 
an Elema-Schonander Mingograph (100 mm/sec) into a three-channel display: 
(l) oscillogram, (2) intensity curve, and (3) fundamental frequency 
pattern. The duration of the vowels was measured to the nearest 1/2 
millimeter (i.e., 5 milliseconds). The intensity of the vowels was 
measured in terms of peak sound pressure level in cLB relative to an 
arbitrary level. 

Table I presents the intensity results. There were no differences 
between the vowels with secondary stress and their unstressed counter- 
parts. Note that there was a 1 dB difference between the unstressed 
C£D*s of -CtetD- of 2.A-C and between the unstressed CeG’s of — C t £ t □— 
of 4.A-B. However, these differences did not occur between similar 
unstressed vowels within the other sentences. 
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TABLE I 

AVERAGE INTENSITY OF VOWELS IN UTTERANCES OF VARIOUS LENGTHS (in dB) 

(Secondary stressed vowels underlined) 



Sentence 
Type v 


Syllable Type 


te(t) 


1 get 


het 


te: (k) 


Uk 


ii 


1A 


1+3 






1+1 






IB 


1+3 






Ul 






2k 


1+3 






1+1 


1+2 




22 


1+1+ 






1+1 


1+2 




2C 


1+3 






Ul 


1+2 




3A 


1+3 


1+3 




i+i 


1+1 




3B 


- 1+3 


1+3 




la 


1+1 




3C 


1+3 


1+3 




HI 


1+1 




hA 


1+3 


1+3 


1+2 


i+i 


1+1 




h2 


1+3 


UJ 


1+2 


ur 


1+2 




5A 


1+3 


1+3 


1+2 


i+i 


1+1 


38 


5B 


1+3 


s 


1+2 


Ui 


1+1 


38 

















Table II presents the duration results. There was a 1-7 msec, 
difference between unstressed vowels of the same syllable sequence 
with the A-B-C comparisons and also between the secondary stressed 
vowels of the same syllable sequences in the A-B-C comparisons. In 
six of the seven unstressed versus secondary stressed comparisons, 
the unstressed vowel was longer than its secondary stressed counter- 
part; the range of these differences was 6-12 msec. In only one 
comparison (lA-b) was the secondary stressed vowel longer; the 
difference was ll+ msec. 
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TABLE II 

AVERAGE DURATION OF VOWELS IN UTTERANCES OF VARIOUS LENGTHS (in msec.) 

(Secondary stressed vowels underlined) 



Sentence 


Syllable Type 


Type 


tfc(t) 


gtt 


htt 


te- ( k) 


ttk 


ij 


1A 


72 






7U 






IB 


67 






88 






2A 


71 






88 


70 




2B 


72 






83 


66 




2C 


66 






76' 


56 




3A 


1 58 


Jb 




87 


69 




3B 


58 


lb 




80 


6b 




3C 


51 


§1 




68 


55 




4a 


56 


BO 


59 


89 


70 




ua 


56 


79 


55 


83 


6k 




5A 


55 


81 


5b 


86 


68 


57 


5B 


56 


80 


5b 


8b_ 


65 


51 



Since the average differences fall below the just noticeable 
differences, intensity and duration cannot be considered as acoustic 
correlates of secondary stress. However, since the fundamental 
frequency of the vowel comparisons had not been analyzed, this 
parameter could not be ruled out as a possible correlate. To 
determine if this was a promising direction for a future study, a 
perceptual test was given to the subject to see if indeed he could 
perceive the stress patterns that he had produced. The subject was*' 
presented with a tape of twenty randomized productions of the sentences: 

2. B. £ TtJ ttt : £ + .tk , p£tit] "You (pi.) painted Pete." 

C. £ f tj ttt :©: ttk p£tit] "You (pi.) painted Pete.". 

and twenty randomized productions of the sentences: 

3. B. [ f£j ttgfct :e: ttk p£tit] "You (pi.) kept painting Pete 

C. £ ft jttgfct :e: ttk p^t it] "You (pi.) kept painting Pete 

These were the two sets of sentences in which alternate secondary stress 
assignments occurred. The subject was asked to assign secondary stress 
to each sequence. He correctly identified 6 out of 20 sequences in 
the 2.B-C set, and 10 out of 20 sequences in the 3.B-C set. Hence, 
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his judgments were random. We conclude that an explanation of 
Hungarian secondary stress in terms of acoustic and perceptual 
correlates does not seem promising. 



Footnote 



^Most linguists who have commented on Hungarian stress hold 
that secondary stress occurs on the third and every subsequent odd- 
numbered syllable of a word, i.e. according to numerical syllable 
position . Some linguists, notably Szinnyei and Lotz, point out that 
a short third (and any odd-numbered) syllable causes the stress to 
shift to the following even-numbered syllable; hence, in this view, 
the relevant condition is the length value of a syllable. For 
references, see Kerek (in press). 
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1 . Introduction 

This paper constitutes a first report on an experiment designed 
to test the relevance of various suprasegmental parameters in the 
perception of quantity in Estonian. The test materials consisted of 
synthetically produced acoustic stimuli, intended to sample systema- 
tically the acoustic spaces containing the minimal triples taba - 
tapa - tappa and sada - saada ! - saada . The synthesis was performed 
by means of a Digital Data Processor (DDP 2h) computer at Phe Bell 
Telephone Laboratories The synthesis was carried through entirely 
by rule, i.e. no attempt was made to imitate a known speaker. The 
stimuli will be described below in more detail. Test tapes containing 
randomized stimuli were presented to 2b listeners, who are native 
speakers of Estonian, at the Experimental Phonetics Laboratory of 
the Academy of Sciences in Tallinn, Estonia. ^ Two tapes were used, 
one for the taba - tapa - tappa set , the other for the sada - saada ! 

- saada set; each contained 252 stimuli. As there were 26 listeners 
and each made 50^+ judgments, the data consist of 13,10^ individual 
judgments . The statistical evaluation of the materials is in progress 
however, some results are already available, and a preliminary survey 
is given' below. 



2. Taba - tapa - 



The synthetic material was designed to test the ranges of /p/ 
durations which would be assigned to the three quantities, and the 
contribution of second syllable duration to the perception of the 
three test words . The duration of / p/ was varied in twenty-one 10 
msec steps over a continuous range from ho to 2k0 msec . Each of the 
21 /p/-durations was combined with three durations for the second 
vowel: 180 msec, 120 msec, and 90 msec. The duration of the first 
vowel was kept constant at 120 msec ; the fundamental frequency was 
likewise constant (at 120 Hz). The total of 21 x 3 = 63 stimuli was 
arranged in four different randomizations and presented to listeners, 
who had to assign each stimulus to one of the three words taba , 
tapa or tappa . The listeners thus made a forced-choice linguistic 
judgment rather than a phonetic judgment. Each listener gave 252 
responses, for a total of 6,552 responses. The results of the 
listening test are summarized on the following figures and tables. 

Table 1 and Figure 1 show the general effect of second., syllable 
duration on the assignment of the words to quantities one-, two and 
three. It is obvious that a second syllable duration of 180 msec 
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favors assignment to quantities one and two: the number of taoa and 
tapa responses is greatest under this condition. On the other hand, 
a second syllable duration of 90 msec favors assignment of the word 
to quantity three . 

Tables 2-h and Figures 2-h show the number of judgments as 
taba , tapa or tappa as a function of the duration of intervocalic 
/p/. Each of the three tables and figures represents judgments 
associated with one of the three second syllable durations. The 
discussion of the tables and the figures will be limited to a few 
brief comments . 

If we consider the crossing-points of curves representing taba , 
tapa , and tappa judgments as ’phoneme boundaries’ between quantities 
1, 2 and 3 of the intervocalic consonant, then we note that the 
phoneme boundary between /p/ in quantity 1 and /p/ in quantity 2 
depends only slightly on the duration of the second vowel: with 
decreasing second syllable duration, the boundary shifts from 
approximately 110 msec for a second syllable duration of 180 msec to 
105 msec for a second syllable of 120 msec, and to 100 msec for a 
second syllable of 90 msec. However, the boundary between quantities 
2 and 3 appears crucially affected by the duration of the second 
syllable. Figure 2 shows that if the second syllable had a duration 
of 180 msec, the boundary between tapa and tapp a was at 225 msec, 
and even with the longest duration, 2^0 msec, the differentiation 
between long /p/« and overlong /p/ was very tenuous. With second 
syllables of 120 and 90 msec , the boundary between long and overlong 
intervocalic /p/ occurred at 175 and 170 msec respectively. 

3. Sada - s aada l - saada 

The set of test items designed to test the perception of quantity 
in disyllabic words of jhe type sada - saada *. - saada is a little 
more complicated. This time there were three variables: duration of 
the vowel of the first syllable, duration of the vowel of the second 
syllable, and the fundamental frequency pattern distributed over the 
two syllables . The duration of the first vowel varied in seven 20- 
msec steps from 120 to 2h0 msec, while the duration of intervocalic 
/t/ was kept constant at 60 msec. Each of ohe first syllables was 
combined with the same three second syllable durations as in the 
previous case, namely 180 msec, 120 msec, and 90 msec. Furthermore, 
each disyllabic stimulus was synthesized with three fundamental 
frequency patterns: a level pattern (monotone at 120 Hz), a step- 
down pattern (with the first syllable level at 120 Hz and the second 
syllable level at 80 Hz), and a falling pattern (first syllable 
falling from 120 Hz to 80 Hz, second syllable level at 80 Hz). The 
total number of stimuli was again 7 x 3 x 3 = 63, the total number of 
items on the randomized tape was 252, and the number of judgments 
was 6,552. 

The results are presented on Tables 5-8 and Figures 5-11- 
Again, only a few descriptive comments will be given this time. 

Table 5 and Figure 5 show the influence of second syllable 
duration and fundamental frequency pattern on the overall classification 
of stimuli as sada, saada! and saada. As is apparent from the left 
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half of Figure 5, the influence of second syllable duration was 
comparable to what was observed with the set taba - tapa - tappa : 
a longer second syllable favored judgments for quantities 1 and 
2, and disfavored judgments as quantity 3, while the shortest second 
syllable increased the number of quantity 3 judgments in a substantial 
manner . 

This effect is, however, rather limited compared to the influence 
of the fundamental frequency pattern. As becomes apparent from 
Figure 5» the monotone condition was relatively neutral. The sten- 
dovn pattern, with the first syllable level at 120 Hz and the second 
syllable level at 80 Hz, produced the greatest number of quantity 2 
judgments and the smallest number of quantity 3 judgments. It is 
important here to notice that the step-down pattern actually decreased 
quantity 1 judgments; for quantity 1, the monotone pattern was the 
most favorable one. 

Conversely, the falling pattern significantly increased the 
number of quantity 3 judgments and decreased quantity 2 judgments . 

This decrease took place almost exclusively at the expense of 
quantity 2, since the number of quantity 1 judgments remained 
practically constant. 

The phoneme boundaries for the duration of the first vowel are 
rather difficult to establish, since both the second syllable 
duration and especially the fundamental frequency pattern have such 
a strong influence on perception. Some of the problems are 
illustrated on the figures. 

Figure 6 shows the assignment of stimuli to quantities 1, 2 
and 3 with a second syllable of 180 msec and with a level fundamental 
frequency pattern. It may be recalled that these two conditions 
favor assignments to quantity 1 and disfavor assignments to quantity 
3. As is obvious from the figure, the overlap between quantities 1 
and 2 occurs at approximately 160 msec , while the two curves representing 
quantities 2 and 3 do not overlap at all. Even at the longest duration, 
2k0 msec, 73 out of 10i| judgments were still made in favor of quantity 
2 . 

Figure 7 shows the number of judgments with the same second 
syllable duration — 180 msec — but with a falling fundamental frequency 
pattern on the first syllable. As was mentioned before, this pattern 
favors assignments to quantity 3 and disfavors assignments to 
quantity 2, leaving quantity 1 practically unaffected. The phoneme 
boundary between quantities 1 and 2 has shifted only very slightly, 
from 160 msec to approximately 155 msec. It is now also possible to 
talk about a phoneme boundary between quantities 2 and 3: it would 
fall at about 210 msec. 

Figure 8 shows assignments to the three quantities with a short 
second syllable (90 msec) and monotone fundamental frequency. As 
may be remembered, the short second syllable favors assignments to 
quantity 3» while the monotone fundamental frequency pattern is 
relatively neutral. A characteristic of all three curves is the 
extensive overlap between them and the fact that all three curves peak 
at approximately 75$. The reliability of recognition here obviously 
was not very great; the phoneme boundaries, however, seem not to have 
been affected. 
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Figure 9 shows assignments to the three quantities under 
conditions maximally favoring quantity 3: a short second syllable 
(90 msec) and a falling fundamental frequency pattern. The reduction 
of the number of quantity 2 judgments is particularly striking: even 
at the l 60 msec duration, which produced the greatest number of 
quantity 2 judgments, their number did not exceed 6h (out of lOU). 

The phoneme boundary between quantities 1 and 2 is not affected, but 
the boundary between quantities 2 and 3 has now shifted from 210 to 
175 msec. The peak of the curve has shifted from 180 msec with level 
fundamental frequency (Figure 8) to l60 msec. 

Figures 10 and 11 summarize the influence of fundamental frequency 
patterns on assignment to quantities 2 and 3. The second syllable 
in these two sets of examples was constant at the most neutral, 
intermediate value, namely at 120 msec. 

Figure 10 shows assignments to quantity 2. It is obvious that 
the left-hand slope of the curve depends very little on the fundamental 
frequency pattern: the phoneme boundary between quantities 1 and 2 
is barely affected by the fundamental frequency. On the other hand, 
the position of the peak and the phoneme boundary of quantity 2 with 
regard to quantity 3 are both strongly affected: the peak shifts 
from about 210 msec with the step-down curve to l 80 for the monotone 
and to 160 for the falling pattern. 

The converse situation appears on Figure 11, which shows the 
influence of fundamental frequency on assignments to quantity 3. 

Here the neutral pattern produced the smallest number of assignments, 
the step-down pattern increased the number of quantity 3 judgments 
somewhat (although the curve never reached 70%), and the falling 
pattern both steepened the slope of the curve and made it reach a 
higher peak. It should be noted that even with the falling fundamental 
frequency pattern the highest number of quantity 3 judgments was 90 
out of 10L. The peak value for quantity 3 judgments for the whole 
set of conditions was reached when both conditions were met: the 
fundamental frequency had a falling pattern and the second syllable 
was short . 

Let me now summarize briefly where we stand with regard to the 
status of the experiments . I am currently in the process of working 
out the statistical design for testing the significance of the 
relationships displayed on this set of tables and figures. I intend 
to compute correlations between the variables and the judgments and 
establish the relative contribution of each variable. Until this 
part of the project is completed, the results are somewhat impression- 
istic. Nevertheless, it is possible to draw some tentative generaliza- 
tions . 

First of all, I think it is clear that the assignment of a word 
to a quantity depends not only on the duration of a first syllable 
vowel or an intervocalic consonant, but also on the duration of the 
second syllable and on the fundamental frequency pattern applied to 
the word as a whole . If one defines the point of overlap between two 
distribution curves as the boundary between two phonemic quantities, 
one may claim that the placement of these boundaries depends 
significantly on both second syllable duration and fundamental 
frequency. I believe that this observation lends support to the 
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notion that vhat ve are dealing with is a higher-level suprasegmental 
pattern distributed over the whole disyllabic word, not with 
independently functioning segmental quantity. 

It is interesting, furthermore, that the boundary between 
quantities 2 and 3 is more strongly affected by the pattern applied to 
the word as a whole than the boundary between quantities 1 and 2. 

In a very tentative sense, one might find support here for the idea 
that the older two-way opposition between short and long is more 
firmly segmentally anchored than the relatively new three-way 
opposition between short, long and overlong. The older opposition 
is mainly segmental; the newer three-way opposition is mainly based 
on differences between patterns manifested over the whole disyllabic 
word. The implications of these results will become clearer when 
the statistical analysis is complete. 



Footnotes 

•^The DDP 2k computer is a machine of medium size (12K) and 
speed (5 microseconds). The synthesis programs were written by 
B. E. Caspers (B. E. Caspers, "Software Facilities and Operating 
System of a DDP- 22k Computer", Bell Telephone Laboratories, Murray 
Hill, N.J. , 1968). I am grateful to Dr. P. B. Denes, Head of the 
Speech and Communication Research Department, Bell Telephone 
Laboratories, for his assistance. 

2l am indebted to Mr. Kullo Vende for his invaluable help in 
arranging for the listening sessions. I would also like to thank 
all individuals who participated in the listening tests. 
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Table 1. Judgments depending on second syllable duration. 



Duration of 
V2 in msec 


taba 


tapa 


tappa 


Total 


180 


78k 


1090 


310 


2181* 


120 


686 


767 \ 


731 


2181* 


90 


656 


731 


797 


2181* 


Total 


2126 


2588 


1838 


6552 



Table 2. Judgments depending on the duration of /p/ 
V2 = 180 msec 



Duration of 
/p/ in msec 


taba 


tapa 


tappa 


1*0 


10k 






50 


103 




1 


60 


101* 






70 


101* 






80 


103 


1 




90 


97 


7 




100 


78 


26 




110 


50 


51* 




120 


26 


78 




130 


9 


93 


2 


1U0 


3 


100 


1 


150 


2 


102 




160 




98 


6 


170 




93 


11 


180 




92 


12 


190 


1 


80 


23 


200 




71 


33 


210 




6l 


1*3 


220 




56 


1*8 


230 




1*5 


59 


2l*0 




33 


71 


Total 


781* 


1090 


310 




- - — — i 


c 1 


1 0 
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Table 3. Judgments depending on the duration of /p/ 
V2 = 120 msec 



Duration of 
/p/ in msec 


taba 


tapa 


tappa 


40 

50 


104 

103 


1 




60 


102 


2 




70 


99 


5 




80 


96 


8 




90 


83 


21 




100 


58 


45 


1 


110 


33 


71 




120 


4 


97 


3 


130 


1 


98 


5 


l4o 


3 


97 


4 


120 




82 


22 


160 




81 


23 


I7O 




73 


31 


180 




31 


73 


190 




27 


77 


200 




14 


90 


210 




8 


96 


220 




3 


101 


230 




3 


101 


240 






104 


Total 


686 


767 


731 
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Table Judgments depending on the duration of /p / 
V2 - 90 msec 



f 



Duration of 
/p/ in msec 


taba 


tapa 


tappa 


ho 


10U 






50 


102 


2 




60 


103 


1 




70 


100 


k 




80 


89 


15 




90 


67 


35 


2 


100 


53 


51 




110 


31 


12 


1 


120 


5 


91 


8 


130 




97 


7 


1U0 




98 


6 


150 




Qh 


20 


160 


1 


76 


21 


170 




50 


5h 


180 




23 


81 


190 


1 


22 


81 


200 




11 


93 


210 




5 


99 


220 




1 


103 


230 






104 


2h0 




1 


103 


Total 


656 


731 


797 
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Table 5 

Judgments depending on second syllable duration (fundamental frequency 

patterns combined) 



Duration of 
V 2 in msec 


sada 


r— ■ 1 

saada! 


r ■ - - -1 

saada 


r ’ 1 

Total 


180 


TIT 


lllU 


353 


' 2181+ 


120 


596 


105 !+ 


53l+ 


2181+ 


90 


569 


9b2 


6T3 


2181+ 


Total 


1882 


3110 


1560 


6552 



Judgments depending on fundamental frequency pattern (second syllable 

durations combined) 



Fq pattern 

Tin Iiz) 


sada 


saada! 


saada 


f 

Total 


120-120/120 


66 9 


1096 


1+19 


2181+ 


120-120/80 


605 


1326 


253 


2181+ 


120 - 80/80 


608 


688 


888 


2181+ 


Total 


1882 


3110 


1560 


. 6552 
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Table 6. Judgments depending on first syllable duration and fundamental 
frequency pattern (-second syllable duration constant at 180 msec) 



Fo pattern 
(in Hz) 


Vj_ duration 
(in msec) 


sada 


saada! 


saada 


Total 


120-120/120 


120 


101 


3 








ll*0 


89 


15 








160 


52 


51 


1 






180 


IT 


81 * 


3 






200 


1 


93 


10 






220 




87 


17 






2 k 0 




73 


31 




Total 




260 


1*06 


62 


728 


120-120/80 


120 


96 


8 








ll*0 


85 


16 


3 






160 


1*2 


57 


5 






180 


10 


81 * 


10 






200 


3 


9 b 


7 






220 




89 


15 






2l*0 


1 


75 


28 




Total 




237 


1*23 


68 


728 


120-80/80 


120 


99 


5 






— 


ll*0 


72 


31 


1 






160 


1*1 


58 


5 






180 


5 


78 


21 






200 


2 


60 


1*2 






220 


1 


1*5 


58 






21*0 




8 


96 




Total 




220 


285 


223 


728 






717 


llll* 


353 


2181* 
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Table 7- Judgments depending on first syllable duration and fundamental 
frequency pattern (second syllable duration constant at 120 msec) 



F pattern 
tin Hz) 


Vj duration 
(in msec) 


sada 


saada! 


saada 


Total 


120-120/120 


120 


95 


8 


1 






l40 


77 


27 








160 


23 


72 


9 






180 


10 


82 


12 






200 


2 


77 


25 






220 


1 


6i 


42 






240 


1 


34 


69 




Total 




209 


361 


158 


728 


120-120/80 


120 


96 


8 








l4o 


78 


25 


1 






160 


17 


83 


4 






l8o 


7 


90 


7 






200 




92 


12 






220 




92 


12 






240 


1 


70 


33 




Total 




199 


460 


69 


728 


120-80/80 


120 


87 


15 


2 






l40 


69 


33 


2 






160 


17 


75 


12 






180 


10 


58 


36 






200 


1 


27 


76 






220 


3 


12 


89 






240 


1 


13 


90 




Total 




188 


233 


307 


728 






596 


1054 


534 


2184 
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Table 8. Judgments depending on first syllable duration and fundamental 
frequency pattern (-.second syllable duration constant at 90 msec) 



F 0 pattern 
(in Hz) 


Vj_ duration 
(in msec) 


sada 


saada! 


saada 


Total 


120-120/120 


120 


74 


26 


4 






l40 


76 


27 


1 






l6o 


32 


63 


9 






180 


l4 


77 


13 






200 


3 


68 


33 ; 






220 


1 


40 


63 






240 




28 


76 




Total 




"200 


„ 329 


199 


T28 


120-120/80 


120 


78 


25 


1 






140 


58 


44 


2 






160 


22 


78 


4 






180 


9 


86 


9 






200 




'87 


17 






220 


1 


69 


34 






240 


1 


54 


49 




Total 




169 


443 


116 


728 


120-80/80 


120 


87 


17 








l40 


79 


19 


6 






160 


15 


64 


25 






180 


l4 


37 


53 






200 


1 


20 


83 






220 


2 


8 


94 






240 


2 


5 


97 




Total 




200 


170 


358 


728 






569 


942 


673 


2184 
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SECOND SYLLABLE DURATION IN msec 



NUMBER OF JUDGMENTS 





po 


(*J 


Tl 
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Figure 1* Number of*’ Judgments as baba , tapa Figure 2. Number of judgments as taba, tapa 

or jr a -PP a » expressed as a function of the or tappa , expressed as a function of the 

duration of the second syllable. duration of intervocalic /p/. Duration 

of the second syllable was constant at 
180 msec. 
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Figure 11. Number of Judgments as saada 

(quantity 3), expressed as a function of 
the duration of the first syllable and 
the fundamental frequency pattern. 
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Phonological Rules in Lithuanian and Latvian* 

Zinny S . Bond 



Introduction 



Lithuanian and Latvian are quite closely related languages, Latvian 
traditionally being considered the more innovating of the two. The two 
languages present an ideal case for a comparison of their grammars in 
terms of shared phonological rules . 

In the ideal case , two independently developed grammars of the 
languages would be compared. However, though there is an extensive 
treatment of Latvian phonology in a generative framework (Halle and 
Zeps, 1966), recent work on Lithuanian has been primarily concerned with 
an analysis of accent. Only Heeschen ( 1967 ) has considered other phono- 
logical phenomena, and his treatment of Lithuanian phonology is also 
primarily concerned with accent assignment. 

I will simply assume that the analysis of Latvian phonology is 
basically sound and see which of the Latvian rules are applicable in 
Lithuanian. If the rules developed for Latvian can also be shown to 
operate in Lithuanian, then the rules in question can be established as 
shared by the two languages. The interesting questions in this comparison 
concern not so much the fact of shared rules, but the place of innova- 
tions in the two grammars , as well as changes in the form and applica- 
bility of rules. 

This paper will be limited to rules primarily involved in the 
derivation of verbs, though obviously some of the rules are quite general. 
First, the rules developed by Halle and Zeps for Latvian will be surveyed 
briefly. Then, each rule will be considered in how (or if) it is 
applicable to Lithuanian. Some Lithuanian rules will also be discussed. 
Finally, differences between the two sets of rules will be analyzed. 



The Latvian Rules 

The fundamental phonological processes have been described by 
Halle and Zeps (Halle and Zeps, 1966; Zeps, 1970)* I will describe the 
rules they have developed and add, for clarity, a few examples of their 
application. The notation is informal; examples are given in traditional 
orthography . 



*This paper was written in the summer of 1970 while the author held an 
NDEA Title VI Fellowship. 
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k/c rule 

Velar stops are replaced "by dental affricates before front vowels 





^k ^ 


. J 


fi , _ i 






1 


L d O \ 



saku *1 say*, saclsu *1 will say* 
ruaka ’hand' , ruacipa ’hand* diminutive 

2. i/J rule 

The rule defines alternations of long vowels with sequences of 
vowel plus v or 



/ 





r i 


r 1 


i J 


\ \ 


+ S r 


UJ 


L 'J 



sut *to sow* , suvu *1 sowed* 
lit ”to rain*, lija ’it rained’ 

3. n/i rule 

The sequence vowel plus ri becomes the sequence vowel plus or u: 




/ Front vowel 



u / Back vowel 



The rule accounts for two types of alternations. First, a long vowel 
can alternate with a vowel + n sequence, as in dzinu *1 drove*, and 
dzlt ’to drive*. Secondly, the rule provides some of the inputs to 
the metathesis rule, thereby accounting for alternations of the form 
pruatu *1 know how’, pratu *1 knew how*. In the second case, the -n 
never appears on the surface . 

h . e/e rule 

£ is raised to £ before £ or any number of vowels in a word 
will be raised as long as there is no intervening back vowel. 

£ -*• e / i, j 

£cEtu *1 would harrow’, ecesi ’you will harrow’ 

5 . Metathesis 

Except where specifically blocked, metathesis applies uncondi- 
tionally, to all sequences of the appropriate shape. In spite of the 
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notation, there are only two possible outputs of the metathesis rule, 
represented as ie and ua in the traditional orthography. The second 
element of these diphthongs is a mid or low central vowel of rather 
obscure quality: C©3, Ca 3 or even Ca3. 




skrien *he runs*, skreja *he ran* 
duad *he gives*, deva *he gave* 



6 . Ablaut 

£ alternates with i in non-present tense forms: 



e -*■ i / 




in non-present tense forms 



T • Vowel truncation 



V - 0 / 



r + 

% 




V 

s 



Vowel truncation is quite well motivated, although the details of the 
rule depend on assumptions about the underlying representations more 
than in the case of most rules. The need for a truncation rule, however, 
is shown by many alternations: for example, augu *1 grow* vs. audz 
*you grow*, from /aug + i/. 



8. Syncope 

The syncope rule converts a sequence of two identical vowels to 
a long vowel. 



V + V -► V 



Both the n/i rule and the i/j rule indicate that it is advantageous to 
treat surface long vowels as a sequence of identical short vowels. But 
this treatment requires the syncope rule to convert the vowel sequence 
to a long vowel. 

9. Vowel lengthening 

Under rather complicated conditions, the stem of verbs is lengthened. 
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v + v / 



in the past-tense forms of 
verbs with a palatal present 
tense infix 




celu *1 lift*, celu *1 lifted* 
kaiiju *1 kill*, kavu *1 killed* 

The remainder of the rules are termed * lower level* phonological 
rules by Halle and Zeps, but they do not specify the criteria for this 
distinction, 

10. Spirantization 




metu *1 threw*, mest *to throw*, from /met + t / 
11. Dental mutation 




lacis *bear* nom. sing., laca *bear* gen. sing., from /lac + 
(cf. gulbis *swan* nom. sing., gulbja gen. sing.) 



12. j loss 



J - 0 / 

/lacj + a/ -*» 






palatal consonant 



laca 



13. Voicing assimilation 

All obstruent clusters are either voiceless or voiced, depending 
on the voicing of the last element. 



C+obstruentD -*■ CavoiceD / 



+obstruent. 

avoice 
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The following are some sample derivations of Latvian verbs . In 
the underlying representation, the verb is composed of a verb stem, an 
optional tense marker, and a person ending. Many verbs have special 
tense infixes as well. For example, the -tt- in klist is the underlying 
representation of the traditional -st- present-tense infix of Baltic 
verbs . 



laenk + au 
lSBik + au 
lisek + ua 
liaku 

lieku ’ I put * 
lank + a + i 
lane + a + i 
laic + a + i 
liac + i + a 
liac 

liec ’you put 1 
kliid + tt + a 
kliid + tt 
klld + tt 
kHz + st 
klist 

klist ’he strays* 



n/i rule 
metathesis 
vowel truncation 
in the orthography 

k/'c rule 
n/i rule 
metathesis 

vowel truncation (morpheme boundaries are 
inserted to enable the rule to apply twice) 
in the orthography 

vowel truncation 
syncope 

spirant ization 

voicing assimilation (and contraction of 
identical spirants ) 



Lithuanian Counterparts of Latvian Rules 



Before discussing the Lithuanian counterparts of the Latvian miles, 
it is necessary to say a few words about the underlying representations 
that have been selected for Lithuanian. In general, the representations 
of verb stems will be selected to be as close as possible to the Latvian 
representations, whenever a particular verb has a cognate in Latvian. 

Long vowels will be analyzed as a sequence of two short vowels, even 
though this analysis may complicate accent assignment; Lithuanian accent 
rules will be ignored. 

The present tense person endings have been selected on the basis 
of the person endings that appear with the reflexive verbs, where the 
endings are protected by a consonant from vowel truncation. There are 
two sets of past tense person endings. Though these endings are apparently 
predictable, at least in part, in this paper verbs will simply be 
considered to be marked C+ -aa past! and C+ -ee pastl and be assigned 
the appropriate person endings on this basis. As in Latvian, many verbs 
have special tense infixes . 

Of the Latvian rules* discussed, at least seven also appear in 
Lithuanian . 

1. i/,1 rule 

The i 7i rule is identical in Lithuanian and Latvian. For example. 
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the rule is needed in the derivation of the verb zuti 1 perish 1 , with 
the present tense zusta and the past tense zuvo . The stem can be 
represented as / zuu-/ ; the infinitive is formed from /zuu+ ti/. The 
present tense forms take the -st- infix:/zuu + tt + a/-*- zusta . In the 
past tense forms, the second vowel of the stem precedes another vowel, 
so the i/j rule applies: Jzum + bsI -*• zuvaa , and zfevo by a later rule. 
Similarly, gffti ^eal* has the present tense formed with a palatal 
infix:/gii + i + a /-» pffja ; in the past tense, the second stem vowel 
directly precedes another vowel, so the i/j rule applies: /gii + aa/* 
gijo - 

As in Latvian, v and can be regarded as realizations of under- 
lying u and i_; for example, verbs like dvSsti (dvfsia, dvffse.) f die* 
can be entered in the lexicon as /dues + ti/jthe i/j rule will produce the 
correct output. 

The Lithuanian rule can be formulated to be exactly like the 
Latvian rule: 




There are some exceptions to the i/j rule. First, there is the 
general constraint, shared by Latvian, that the first vowel in a 
sequence of identical vowels is exempt from the i/j rule. Secondly, a 
few verbs behave anomalously with respect to the mile; for example, 
g£iti 1 chase* keeps both vowels in the infinitive, instead of having 
the form predicted by the i/j rule: *gviti . However, since the 
exceptions appear to be few, they can simply be marked C-i/.j ruleD. 

Palatalized and non-palatalized (hard) consonants can contrast 
only before back vowels; otherwise, consonants are always palatalized 
before front vowels and hard otherwise. In the traditional orthography, 
palatalization before back vowels is represented by -i- ; this device 
can be employed in the underlying representations £« well. For example 
Ck’ausasD 1 skull 1 would have an underlying representation something like 
/kiaus + as/. The i/J rule would produce /kjaus + as'; consonants preceding 
or front vowels become palatalized, and the j_ can be dropped. Thus, 
there is no difficulty with -i- as a marker of palatalization. This, 
of course, simplifies the description of the language, since palatali- 
zation can be predicted entirely by rule. 

2. i/n rule 

The i/n rule has no direct counterpart in Lithuanian, but there 
are alternations of long vowels with vowel-nasal sequences. For example, 
z£sti, zinda, z£ndo f suck f and brgsti , brgsta , brendo Mature 1 . Under 
rather complex conditions, the nasal of the underlying vowel-nasal 
sequence vocalizes, creating a sequence of nasalized vowels. Subse- 
quently, all vowels become de-nasalized. Heeschen discusses these 
alternations, giving the required rule in a form essentially similar to 
the following: 
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[ v v 

V S, S, Z Z 

V ( V ) n ■* V ( V ) V / J 

11 111 /e, m, r 

3. Metathesis 

The metathesis rule follows the i/j rule, and is also required 
in Lithuanian phonology. In Latvian, the metathesis rule applies 
to a great many verbs; in Lithuanian, however, metathesis is a 
rather minor rule. It can be motivated only for au and ei_, not, as 
far as I can tell, for any of the other sequences which are also 
subject to metathesis in Latvian. The verb duoti f to give* requires 
both metathesis and the i/J rule in its derivation: Aau + ti/becomes 
diioti and Aau + d + a^ becomes duoda by metathesis; /Gau + ee/becomes 
dgve by the i/J rule. 

Some verb stems ending in obstruents have to be entered in the 
pre-metathesis form to prevent the i/j rule from applying; for example, 
ligpti *to order 1 would have the underlying representation Aeip-/. Thus, 
the environment for the i/j rule would not be supplied, and metathesis 
would provide the correct form. 

A very large number of verbs, however, are exceptions to metathesis, 
e.g. kl&usti 'ask*, geisti 1 desire 1 , keikti f curse 1 , kraut i 'heap up', 
leisti 'let 1 , etc. Therefore, it may be more economical tc mark verb 
stems to undergo metathesis and to consider the exceptions as normal, 
rather than to specify the exceptions to metathesis. The metathesis 
rule would still apply to person endings, however. The unmarked state 
would be for metathesis to apply to person endings and not to apply to 
verb stems . 



k . Ablaut and Vowel lengthening 

Since ablaut and vowel lengthening are both morphologically 
conditioned rules, the two rules will be discussed together. Lithuanian 
has an ablaut rule very similar to the Latvian rule: 



e i / 



For example, pifkti, pe?ka, pi?ko 'buy'. 

There are at least two rules lengthening vowels . The rule found 
in Latvian, lengthening vowels in the past tense, also operates in 
Lithuanian, as exemplified by verbs like: minti , myne 1 tread 1 ; pinti , 
pffne ‘wreathe 1 ; durti , dure ^tab*; grumti , grume Combat 1 . 

When the stem vowel -a- is lengthened in the past tense, it is 
subsequently raised to -o - « and, similarly, when -£- is lengthened, it 
is raised to -e- . For example , karti , kore ‘hang 1 ; plauti , plove 
’wash 1 ; kelti , kele ’lift*. 

The vowel lengthening rule can be formulated to be very much 
like the Latvian rule: 



-s 






1 

m 

n 

r 



in non-present tense forms 
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V •+ V / 




when the verb takes C+-ee pastD 
tense 



All of the verbs showing lengthening in the past tense take thec-eeD 
past tense, with the exception of elti (elna, ijo ) 'go'. It would not 
be surprising, however, if this verb were irregular, specially marked 
to undergo the lengthening rule. The rule must also be prevented from 
applying to verbs like aflti 'to put on shoes* with the past tense form 
Sve instead of **oVe , as predicted by the lengthening mile. 

Many verbs which show lengthening in the past tense forms also have 
a nasal present-tense infix, rather than the palatal infix which appears 
in Latvian, e.g. griauti , griaun a, griove * thunder*; rauti , rauna , rove , 
*tear out*; sauti , sauna , sove 'shoot*. But this is not true of all 
verbs showing lengthening in the past tense . 

Vowel lengthening takes place in the present tense, rather than in 
the past , in another set of verbs . All these verbs have -i- or -u- as 
the stem vowel , and all take the C-aaDpast tense endings . For example , 
dilti , dyia , dilo 'wear away*; dusti , dusta , duso 'suffocate*. 

Apparently, present tense lengthening does not take place before resonants 
krifita 'he falls', mirsta 'he dies'. 

The rule can be formulated as follows : 



V -* C+longD / +Obstruent 

C+highD 

in the present tense, when the verb is marked 
C+ -aa3 past tense 

Finally, there is a class of verbs with long stem vowels that 
lower the stem vowel in the present tense: deti , deda , dejo 'put*; 
dvesti , dvSsia , dvese 'die'. I can not formulate the rule for vowel 
lowering, however, because I can not specify the conditions under which 
the change takes place; some verb stems of essentially identical phono- 
logical shape and morphological composition t $ those listed above do 
not undergo the rule , e.g. grebti , grebia , grebe ' rake ' . 

5 . Vowel truncation 

The vowel truncation rule is difficult to evaluate because, more 
than other rules , its formulation depends on other components of the 
analysis. However, the most economical description seems to call for 
vowel truncation in Lithuanian. In Latvian, of course, vowel 
truncation is very wide-spread; in fact, loss of vowels in final 
syllables is one of the major traditionally-cited Latvian innovations. 

Vowel truncation in Lithuanian can be motivated if the person 
endings that show up in the reflexive, where they are protected by a 
consonant, are considered to appear in the active as well. For example, 

lenki + au 

lenkj + au i/j rule 

lenkj + ua metathesis 

lenkj + u ( vowel truncation 
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Finally, the result is lenkiu Llerjk’uD ’I bend’ 

In the reflexive paradigm, the person endings are protected, and tho 
reflexive form shows the full person ending: lenkitiosi *1 bow (I bend 
myself) * . 

Heeschen formulates the rule quite simply: 

V -*■ 0 / (s)# 

However, he has to exclude the rule from several morphological environ- 
ments , including the reflexive marker -si , and to postulate extra 
vowels to protect some endings. Therefore, Lithuanian vowel truncation 
is not nearly as simple as the rule implies. 

Three of the * lower level* Latvian phonological rules are shared 
by Lithuanian: spirantization, voicing assimilation, and dental 
mutation. 

6. Spirantization 

Lithuanian has a spirantization rule which is identical to the 
Latvian. For example, /met + ti/ results in mesti *to throw*. 

7. Voicing assimilation 

Similarly, Latvian and Lithuanian share a voicing assimilation 
rule, assimilating all obstruents in a cluster to the voicing of the 
last member of the cluster. For example, begti *to run* is phonetically 
Cb®:ktiD. 

8. Dental mutation 

The Latvian dental mutation rule has a very limited counterpart 
in Lithuanian: 




For example skaiciatt *1 read* and skaitei *you read* 



Lithuanian * Lower Level* Rules 



The verb system of Lithuanian requires a number of *lov level* 
phonological processes that do not operate in Latvian. 

1. Obstruent metathesis 

There is an obstruent metathesis rule, exemplified by verbs like 
blSksti , bl&skia , blSske *hit* ; and dreksti , drgskia , dreske * scratch*. 
Apparently, stem- final spirants and velar stops metathesize. That this 
metathesis takes place only before consonants is indicated by the following 
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two verbs: blSksti 1 to hit' and blyksti 'to turn pale'. The present 
tense form of blSks ti is blSskia ; it is derived from /blaski + s / without 
undergoing obstruent metathesis. The present tense of blffksti , however, 
is biyksta ; the underlying representation is A>liisk + tt + a/. Because 
the obstruent cluster precedes the -st- infix, the cluster is subject 
to metathesis. In the past tense, the cluster appears before a vowel, 
and so appears in the pre-metathesis form: blysko . The rule may be 
formulated as follows: 

velar stop + spirant -* spirant + stop / + C 

2. Nasal metathesis 

Seemingly related to obstruent metathesis is metathesis of the 
nasal infix' with the last element of the stem when the stem ends in 
an obstruent. This is exemplified by verbs like the following: kristi , 
krifita , krito * fall ' ; (pa- ) tikti , tirlka , tiko , * like * ; klupti , klumpa , 
klupo ’trip* . The simplest way to handle this phenomenon is to assume 
that the nasal infix is added to the stem, metathesizes when it follows 
an obstruent, and then assimilates to the position of articulation of 
the following obstruent. A sample derivation would be the following: 

klup + N + a 
kluNpa 
klumpa 

klumpa 'he trips' 

If the nasal infix is not followed by an obstruent, i.e. in a present 
tense form like plauna 'he washes', the nasal infix is realized as -n-. 

The following rules are required: 

Obst . + N -*• N + Obst . / + 



n / t, d 

N -*■ m / p, b 

Q / g, k 

N *► n 



nasal metathesis 
assimilation 



3 . Vowel raising 

As mentioned before, non-nasal -aa- becomes long o_ and -es- 
becomes long e_. The syncope rule, which is also required in Lithuanian, 
and vowel de-nasalization, both 'clean-up' rules, would be ordered 
after vowel raising. 



4 . Palatalization 

There is very wide-spread palatalization of consonants in 
Lithuanian; any consonant becomes palatalized in the appropriate 
environment , even non-native consonants in borrowed words . For example , 
filologas 'philologist' and f£zika 'physics' both have palatalized f\ 

The rule for palatalization is: 

front V 

3 

C’ 



C -► 



C’ / 
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5. Spirant assimilation 

Dental spirants become palatal spirants before palatal affricates 




This is clearly indicated by a form like pescias *on foot* which is 
phonetically Cpes’c’asJ. 



6. Final devoicing 

Consonants are devoiced and de-palatalized in word-final position 



C 



-voice 

-sharp 



/ 



0 



Conclusion 



As is clear from the discussion of individual rules , there are 
three possible relations between the miles in the two languages: the 
rules are identical in the two languages , a rule has no counterpart 
in the other language, or a rule has changed in some way. 

Four rules appear to be identical in the two languages: the 
i/j rule, vowel lengthening in the past tense, spirantization, and 
voicing assimilation. The i/j rule and vowel lengthening are best 
considered to be inherited rules, operating at a high level in the 
phonology. Spirantization and voicing assimilation, however, are 
both low level phonological rules ; voicing assimilation is preceded by 
several other innovative low leyel miles in Lithuanian, e.g. final 
devoicing precedes voicing assimilation. 

It is tempting to speculate that the status of the two sets of 
rules is not the same. Though the claim can not be substantiated here, 
it may be that a certain set of rules should be viewed as defining 
constraints on the shape of the phonological output, rather than 
defining phonological alternations. The spirantization and voicing 
assimilation rules appear to be of this * lower level* type. 

There are five rules that appear in both languages but not in 
exactly the same form. These are the n/i rule, ablaut, metathesis, 
vowel truncation, and dental mutation. Only dental mutation is a 
* lower level* rule; the other four miles are higher-level phonological 
rules. In all cases, the Latvian rules appear to be simpler, in one 
way or another, than the Lithuanian rules. 

Assuming that the Latvian n/i rule is an extension of the 
Lithuanian rule defining long-vowel, vowel-nasal alternations, the 
Latvian rule has been simplified in two ways. In Lithuanian, the 
vocalized nasal must match the preceding vowel in all features; in 
Latvian, the vocalized nasal is always a high vowel, matching only in 



O 

ERIC 

iminaffamiaaa 
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the front-back dimension. Secondly, the Latvian rule specifies a 
simpler environment, before any consonant, rather than the rather 
complicated set of consonants required for the Lithuanian rule. 

As is clear from the preceding discussion of metathesis, the 
rule not only applies to more sequences of vowels but also to more 
stems in Latvian than in Lithuanian. Latvian, therefore, has 
generalized the applicability of the rule. 

If ablaut and other morphologically conditioned alternations 
are considered together, then it is quite clear that the Lithuanian 
system is more complex. It includes not only the two rules that 
appear in Latvian bat also several others: it involves more different 
kinds of alternations and more complicated rules to define them. 

Vowel truncation is much more restricted in Lithuanian than in 
Latvian. As has been mentioned previously, virtually all vowels in 
final syllables have disappeared in Latvitm, but this is by no means 
the case in Lithuanian. Apparently, Latvian has extended the 
applicability of the rule. 

Finally, the dental mutation rule, assuming that it is basically 
the same rule in the two languages, applies to almost the whole class 
in Latvian but to only two members of the class in Lithuanian. 

Some rules appear in only the one or the other languages . The 
various morphologic ally-conditioned lengthening rules of Lithuanian 
have already been mentioned; these rules are historical retentions in 
Lithuanian which are lost in Latvian. The status of the two 
Lithuanian consonant metathesis rules is not clear; with the data 
presently at my disposal, I could not determine whether the rules are 
innovations or retentions in Lithuanian. Palatalization and final 
devoicing are both clearly Lithuanian innovations, probably additions 
to the set of * output condition* rules. 

Latvian seems to have irmovated two rules: the k/c rule and the 
e/e rule. These innovations are problematic, however, in that both 
these rules appear at a rather early stage of the phonology. The k/c 
rule and the e/e rule must precede both vowel truncation and metathesis. 
For example, the environment required for the k/c rule may be deleted 
by vowel truncation: audz *you grow*, from /aug + e + i/, vs. aug ’he 
grows*, from /aug + a/. Secondly, a form like vilki *wolves*, from 
/vilk + ai/, indicates that the k/c rule precedes metathesis, since 
the k/c rule is inapplicable when k precedes a front vowel because of 
metathesis . That the e/e rule precedes vowel truncation is clear in 
the derivation of mest Ernest! *to throw*, from /met + ti/; forms that do 
not have a high vowel in the inflectional suffixes keep e_: metu 
CmetuI *1 throw* and met Cmetl *he throws*. 

It is not clear exactly how the two rules came to be ordered early 
in the grammar. Recently there has been considerable discussion about 
rule insertion, summarized in King (1970). King concludes that rule 
insertion — the addition of a rule which must be ordered before a 
phonological rule present in an earlier stage of the grammar — is a 
possible type of linguistic change, but that there are very few good 
examples of it. At first glance, the Latvian k/c and e/e rules look 
like examples of rule insertion; however, the rules may also appear in 
their present order because of rule reordering. The two rules are 
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crucially ordered only with respect to vowel truncation and metathesis , 
both rules that have been greatly generalized in Latvian. It is 
possible that the k/c and e/e rules appear early in the grammar because 
of reordering from ’bleeding* to ’feeding* order. In an earlier stage 
of Latvian, the rules would apply in the following order: metathesis, 
vowel truncation, k/c rule and e/e rule. As vowel truncation became 
generalized, more and more environments for the k/c rule and e/e rule 
were eliminated by the deletion of final vowels , the rules now 
operated in ’bleeding* order. At this point, the rules were reordered 
to ’feeding* order. To determine which process, rule insertion or 
reordering, is responsible for the present rule order in Latvian, more 
evidence is necessary than is available to me at the moment . 

The relationship of the rules in the two languages can be summarized 
in the following way. Latvian has simplified rules, generalized their 
application, and added two high-level rules; Lithuanian has retained 
complex rules which apply under complicated circumstances , and added 
low-level rules. 

The judgment that Latvian is innovating and Lithuanian conserva- 
tive is interesting in this context. Lithuanian preserves complex 
alternations but rather freely changes their phonetic realization; 

Latvian changes the phonetic realization much less , but loses complex 
alternations. The observation is slightly trivial but still worth 
making: a conservative vs. an innovating phonology is not defined in 
terms of surface phonetic realization. 

Obviously, the rules discussed in this paper represent only a 
small fragment of Lithuanian and Latvian phonology. It seems, 
however, that a comparison of the phonological systems of the two 
languages can provide very interesting material for a study of language 
change . 
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